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Abstract 


Internet  aeeess  speeds  of  large  enterprises  and  edueational  institutions  have  improved  dramati¬ 
cally  over  the  past  few  years.  However,  this  higher-speed  connectivity  is  still  ineffective  at  providing 
end-users  with  good  download  performance  and  robustness  from  service  interruptions.  This  arises 
due  to  the  prevalence  of  constrained  links  with  little  spare  capacity  inside  Internet  Service  Provider 
(ISP)  networks. 

In  this  dissertation,  we  fi  rst  investigate  the  location,  latency  and  traffi  c  load  characteristics  of 
network  links  that  limit  the  Internet  performance  of  well-connected  end-networks.  More  impor¬ 
tantly,  we  show  how  end-networks  can  employ  a  clever  Internet  route  selection  technique,  called 
Multihoming  Route  Control,  to  avoid  these  performance  bottlenecks  and  obtain  much  better  Inter¬ 
net  performance.  Using  Internet-scale  measurements  conducted  over  Akamai’s  content  distribution 
infrastructure,  we  show  that  by  multihoming  to  three  ISPs,  and  intelligently  scheduling  transfers 
across  the  ISPs,  an  end-network  could  potentially  improve  its  Internet  round-trip  times  (RTTs), 
throughputs  and  reliability  by  up  as  much  as  30%. 

We  also  compare  the  Internet  performance  and  reliability  from  route  control  against  more  pow¬ 
erful  route  selection  paradigms  such  as  overlay  routing.  We  show  that  the  RTTs  and  transfer  speeds 
from  multihoming  are  within  5-10%  of  overlay  routing.  While  multihoming  cannot  offer  the  nearly 
perfect  resilience  of  overlays,  we  show  that  it  can  eliminate  almost  all  failures  experienced  by  a 
singly-homed  end-network.  We  also  describe  the  design  and  performance  evaluation  of  a  route  con¬ 
trol  system  that  can  be  deployed  by  large  multihomed  enterprises.  We  show  that,  in  practice,  simple 
route  control  techniques  can  offer  Web  performance  within  10%  of  the  optimal  performance  from 
multihoming. 

Finally,  we  investigate  whether,  in  the  future  Internet,  techniques  such  as  route  control  or  overlay 
routing  can  still  provide  good  end-to-end  performance  in  the  face  of  higher  access  speeds  and  a 
vastly  different  traffi  c  mix.  We  show  that  the  structure  of  the  Internet  (i.e.,  a  power  law  degree 
structure  at  the  ISP  level),  together  with  the  routing  protocol  (i.e.,  BGP),  will  convert  certain  keys 
portions  of  the  network  into  persistent  bottlenecks.  We  then  consider  modifi  cations  to  the  ISP-level 
interconnections  to  guarantee  good  end-to-end  performance  in  the  future  Internet. 

We  believe  that  the  contributions  in  this  thesis  signifi  candy  advance  the  state-of-the-art  of  tech¬ 
niques  for  improving  Internet  performance  and  resilience.  Further,  this  dissertation  highlights  im¬ 
portant  guidelines  for  the  design  of  inter-domain  routing  protocols  and  peering  architectures. 
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Chapter  1 


Introduction 


Over  the  past  deeade,  Internet  aeeess  speeds  of  large  organizations,  sueh  as  enterprises,  data  eenters 
and  universities  have  improved  tremendously.  Several  years  ago,  organizations  and  end-users  alike 
were  limited  to  a  few  Kbps  of  network  eonneetion  speeds.  Sinee  then,  teehnologieal  breakthroughs 
sueh  as  Fiber  Optie  links,  eoupled  with  growing  demand  for  bandwidth  from  newer  applieations, 
have  fueled  rapid  improvements  in  endpoint  aeeess  speeds.  Now-a-days,  most  organizations  de¬ 
ploy  very  high  speed  eonneetions,  often  100Mbps  or  more,  to  ensure  robust  performanee  of  their 
“mission  eritieal”  network  applieations. 

End-user  aeeess  speeds  have  also  improved  substantially  over  the  same  time-period.  In  Fig¬ 
ure  1.1,  we  show  the  evolution  of  home  aeeess  speeds  in  the  United  States.  Market  studies  prediet 
that  the  adoption  of  100Mbps  broadband  home  eonneetions  in  the  U.S.  will  begin  as  early  as  the  year 
2010!  In  faet,  home  users  in  Japan  and  South  Korea  already  enjoy  impressive  eonneetion  speeds  in 
the  order  of  several  tens  of  Mbps  [22,  116]. 

When  aeeess  speeds  are  limited,  the  Internet  experienee  of  end-users  is  tightly  eoupled  with  their 
Internet  eonneetion  speeds:  A  28Kbps  modem  line  ean  be  expeeted  to  provide  inferior  download 
speeds  and  availability  than  a  DSL  link  (espeeially  when  the  elient  opens  multiple  Web  eonneetions 
in  parallel).  Similarly,  Web  pages  take  longer  to  load  and  data  transfers  take  longer  to  eomplete 
with  a  1.5Mbps  home  DSL  eonneetion,  than  a  T3  eonneetion  (45Mbps)  in  an  offiee  environment. 
Naturally,  this  leads  one  to  think  that  improved  aeeess  speeds  in  the  near  future  will  drastieally 
improve  the  Internet  experienee  of  end-users. 

But,  is  this  really  true?  Lets  suppose  that  an  enterprise  upgraded  its  high-speed  Internet  eon¬ 
neetion  from  10Mbps  to  100Mbps,  or  even  IGbps.  Informally,  we  refer  to  end-networks  with 
high-speed  Internet  aeeess  as  being  well-connected.  Then,  the  above  question  translates  to  whether 
the  Internet  aeeess  performanee  of  end-users  in  well-eonneeted  organizations,  speeifi  eally  their  ob¬ 
served  download  speeds  and  resilienee  from  serviee  interruptions,  would  improve  proportionately. 

Of  eourse,  higher  speed  eonneetions  will  offer  better-than-reasonable  performanee  and  resilienee. 
But  it  is  unelear  if  future  well-eonneeted  end-users  and  end-networks  will  see  the  bang  for  their 
buck,  i.e.,  matehing  performanee  for  their  higher  aeeess  speeds.  Here,  and  heneeforth,  when  the 
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Figure  1.1:  Evolution  of  home  access  speeds:  We  show  when  various  home  Internet  access  technologies  (specif¬ 
ically,  the  corresponding  access  speeds)  were  adopted  in  the  U.S.  We  also  show  the  expected  adoption  dates  for 
higher  speed  home  access,  such  as  24Mhps  and  lOOMhps. 

context  is  unambiguous,  we  use  the  term  “performance”  to  collectively  refer  to  Internet  download 
speeds,  response  times  for  Internet  transfers  and  resilience  from  service  interruptions,  as  observed 
by  network  endpoints. 

Unfortunately,  as  the  connection  speed  of  a  network  endpoint  grows,  its  performance  will  most 
likely  not  improve  proportionately.  In  fact,  it  is  highly  likely  that  a  IGbps  uplink,  for  example, 
will  offer  no  better  download  speeds  and  response  times  to  an  endpoint  than  a  lOOMbps  link.  As 
we  discuss  next,  this  arises  due  to  the  prevalence  of  key  performance  “road-blocks”,  or  bottlenecks 
which  limit  the  Internet  experience  of  well-connected  endpoints. 

Our  focus  in  the  forthcoming  discussion,  and  in  the  rest  of  the  dissertation,  will  be  on  end- 
networks,  such  as  enterprises,  universities  and  Web  server  hosting  centers:  we  will  use  the  term 
“endpoints”  to  collectively  refer  to  these  network  entities.  The  observations  and  proposals  we  make 
for  end-networks  in  this  dissertation  can  also  be  extended  to  individual  end-hosts. 
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1.1  Road-blocks  to  Effi  dent  Internet  Performance 


There  are  several  key  bottleneeks  that  prevent  a  well-eonneeted  endpoint  from  realizing  the  best 
possible  performanee  from  its  high-speed  access  connections.  In  general,  these  bottlenecks  can  be 
broadly  classifi  ed  into  two  categories:  access  and  non-access  bottlenecks. 

The  fi  rst  category  includes  performance  bottlenecks  that  are  under  the  control  of  the  network 
endpoints  themselves,  a  simple  example  being  ill-confi  gured  software  stacks  at  end-hosts.  Internet 
transfer  speeds,  response  times,  and  even  resilience,  to  some  extent,  may  intrinsically  depend  on 
the  software  stacks  employed  by  end-hosts.  For  example,  several  past  studies  have  identifi  ed  the 
impact  of  the  receive  buffer  size  setting  in  endpoint  Transmission  Control  Protocol  (TCP)  stacks 
on  the  throughput  achieved  by  TCP  connections  [101,  84].  Small  receive  buffer  sizes  constrain 
the  amount  of  data  a  transmitter  can  simultaneously  relay  to  a  receiver.  As  another  illustration, 
older  implementations  of  TCP,  such  as  TCP  Reno,  are  less  resilient  to  packet  drops  in  the  network 
than  latter  deployments  such  as  TCP  SACK.  As  a  result,  clients  employing  TCP  Reno  may  observe 
poorer  network  performance  than  SACK  clients,  especially  when  the  traffi  c  load  on  the  network  is 
high. 

An  important  feature  of  access  bottlenecks  is  that,  being  under  the  administrative  control  of 
endpoints,  they  can  be  easily  overcome  by  means  of  simple  local  upgrades,  such  as  updating  system 
software  on  host  machines,  and  careful  confi  guration.  With  suffi  ciently  high  speed  access  links,  up- 
to-date  software,  and  perfect  confi  guration,  then,  the  performance  of  well-connected,  well-managed 
endpoints  is  limited  only  by  non-access  bottlenecks. 

1.1.1  Non-access  Bottlenecks 

Non-access  bottlenecks  will  persist  irrespective  of  the  access  speeds  or  software  upgrades  at  network 
endpoints.  Essentially,  these  bottlenecks  arise  from  ineffi  cient  design,  operation  and  management  of 
the  wide-area  Internet  as  a  whole,  or  of  key  portions  of  it.  To  overcome  these  bottlenecks,  therefore, 
requires  coordinated  network-wide  upgradation  and  management — a  much  more  formidable  task 
than  local  improvements.  A  key  example  of  such  non-access  bottlenecks  are  constrained  wide- 
area  hot-spot  links.  Other  examples  include  failures  in  critical  network-wide  services,  such  as  the 
Internet’s  Domain  Name  System  (DNS).  More  often  than  not,  factors  such  as  DNS  failures  tend  to 
preclude  or  delay  the  start  of  new  Internet  transfers.  We  do  not  discuss  such  factors  here.  Instead 
we  focus  on  bottlenecks  that  may  impact  both  new  as  well  as  ongoing  Internet  transfers. 

Internet  transfers  will  have  to  share  network  resources,  such  as  raw  link  capacities  and  router 
buffers,  with  millions  of  other  flows  in  the  network.  If  the  volume  of  this  competing  data  traffi  c  is 
low,  the  only  major  constraint  on  the  performance  of  a  data  transfer  is  likely  to  be  the  raw  capacity 
of  the  network  links  it  traverses  (assuming  no  software  limitations).  Often,  since  the  raw  speeds  of 
most  access  links  are  much  lower  than  those  of  links  elsewhere  in  the  network,  i.e.,  of  “non-access” 
links,  the  performance  of  the  data  transfer  is  limited  primarily  by  the  source  or  the  destination 
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connection  speeds. 

However,  many  non-access  links  today  carry  several  hundreds  of  megabits  of  data  traffi  c  per  sec¬ 
ond,  especially  during  peak  hours.  Internet  backbone  providers,  which  own  these  non-access  links, 
employ  a  variety  of  mechanisms  to  protect  the  links  from  traffi  c  overload.  But  the  unpredictability 
of  traffi  c  as  well  as  conflicting  traffi  c  management  goals  of  competing  providers  often  leave  keys 
links  in  the  network  very  congested,  and  with  little  “head  room”  or  spare  capacity  for  additional 
traffi  c.  A  simple  fi  x  to  alleviating  congestion  at  such  hotspots  would  be  to  add  additional  capacity 
wherever  necessary.  However,  this  approach  is  not  economically  viable.  Moreover,  selective  local 
improvements  to  link  speeds  may  simply  push  hotspots  to  other  network  locations. 

In  other  words,  the  performance  of  transfers  initiated  by  well-connected  endpoints  depends  crit¬ 
ically  on  the  volume  of  competing  traffi  c  on  each  non-access  link  they  traverse,  and  the  variation  in 
the  traffi  c  volumes  with  time.  Non-access  links  running  at  close-to-full  capacity  may  drive  endpoint 
transfer  speeds  close  to  zero,  force  packet  drops,  and  cause  connections  to  time-out  or  even  fail. 
In  general,  we  will  refer  to  non-access  links  with  heavy  traffi  c  load  that  limit  the  Internet  perfor¬ 
mance  of  well-connected  endpoints  as  wide-area  bottleneck  links.  When  the  context  is  clear,  we  will 
also  use  the  more  generic  terms  non-access  bottlenecks  or  wide-area  bottlenecks  to  refer  wide-area 
bottleneck  links. 

In  order  to  overcome  wide-area  bottlenecks  and  obtain  satisfactory  Internet  performance,  end¬ 
points  may  require  intrinsic  support  from  the  network.  For  example,  upon  traffi  c  overload  at  a 
certain  link,  the  network  may  identify  the  link  as  a  hotspot  and  divert  endpoint  traffi  c  away  from 
the  link,  in  a  manner  completely  transparent  to  the  endpoint  itself.  Given  that  the  Internet  as  whole 
is  composed  of  thousands  of  links,  and,  at  any  time,  many  of  these  links  may  have  enough  spare 
capacity  to  carry  diverted  traffi  c,  such  a  seamless  rerouting  of  endpoint  traffi  c  is  indeed  plausible. 
So,  in  practice,  does  the  Internet  divert  endpoint  traffi  c  away  from  hotspot  network  locations,  as  and 
when  they  arise?  The  answer  to  this  question  is,  unfortunately,  no. 

Ideally,  if  the  Internet  were  to  support  seamless  diversion  of  traffi  c  from  hotspots  then  this  should 
be  handled,  at  least  in  part,  by  the  Internet’s  routing  protocol.  That  the  Internet  is  does  not  do  so 
arises  from  fundamental  limitations  of  the  routing  protocol.  Next,  we  briefly  overview  the  Internet’s 
routing  protocol  suite  and  examine  why  it  cannot  help  endpoints  circumvent  wide-area  bottlenecks. 

1.1.2  Internet’s  Routing  Protocol:  BGP 

Data  traffi  c  between  any  two  points  in  the  network  will  have  to  traverse  several  routers  and  links.  The 
Internet’s  routing  protocol  suite  determines  the  exact  sequence  of  routers  and  links  to  be  traversed. 
A  quick  overview  of  the  Internet  routing  protocol  follows. 

Internet  routing  overview.  The  Internet’s  routing  infrastructure  is  composed  of  several  thousands 
of  routers  and  links.  The  ownership  of  this  infrastructure,  and  its  operational  and  management 
responsibilities,  are  distributed  across  several  independent  administrative  domains,  called  Internet 
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ISP  A  owns  ISP  B  is  E's 

10.0.0.0/18  customer 


Figure  1.2:  BGP  operation:  An  example  showing  the  propagation  of  BGP  reachability  and  routing  information 
across  ISPs.  The  arrows  show  the  propagation  of  routing  announcements. 

Service  Providers  (ISPs)  or  Autonomous  Systems  (ASes).  Inside  a  domain,  each  ISP  runs  an  “intra¬ 
domain”  routing  protocol  across  its  routers  and  links  to  determine  how  traffi  c  should  be  routed 
across  the  domain.  Typically,  when  computing  these  routes,  ISPs  try  to  optimize  the  usage  of  their 
routing  infrastructure.  For  example,  an  ISP  may  route  traffi  c  in  such  a  manner  that  no  particular  link 
in  its  network  carries  an  unduly  large  amount  of  traffi  c.  This  practice  is  commonly  referred  to  as 
traffic  engineering. 

When  data  traffi  c  traverses  multiple  domains,  ISPs  employ  the  Border  Gateway  Protocol  (BGP) 
to  exchange  traffic.  Figure  1.2  illustrates  the  operation  of  BGP.  In  this  example,  ISP  J  wishes  to 
communicate  with  ISP  A.  A  owns  the  Internet  address  block  10.0.0.0/18.  In  BGP,  A  announces 
reachability  to  the  address  block  it  owns  to  each  of  its  neighbors  D,  H,  E  and  B.  Each  neighbor, 
in  turn  announces  this  information  further  downstream,  appending  its  domain  number  (called  the 
autonomous  system  number,  or  ASN)  to  the  announcement.  Consider  ISP  H.  As  the  announcements 
propagate  through  the  network,  H  receives  two  announcements  for  10.0.0.0/18:  one  through  ISP  D, 
and  another  directly  through  ISP  A.  BGP  path  selection  mechanisms  stipulates  that  H  prefer  the 
route  with  fewer  ISPs  (i.e.,  the  announcement  from  A),  over  the  other  route  available  via  ISP  D.  H 
announces  this  route  further  downstream.  Now,  consider  ISP  E.  Both  the  announcements  it  receives 
for  10.0.0.0/18,  through  ISPs  H  and  B,  have  the  same  number  of  intermediate  ISPs.  In  this  case. 
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E  prefers  the  more  eost-effeetive  route.  For  example,  due  to  eommereial  arrangements  with  its 
neighbors,  routing  via  H  may  require  E  to  pay  H  (in  whieh  ease  H  is  referred  to  as  the  provider  of 
E).  In  eontrast,  B  may  be  the  eustomer  of  E,  and  therefore,  routing  via  B  will  boost  E’s  revenue.  As 
a  result,  E  prefers  to  use  the  route  E— >^B— to  reaeh  10.0.0.0/18.  Announeements  propagate  aeross 
ISPs  in  this  manner,  eventually  reaehing  ISP  J.  After  the  fi  nal  seleetion  of  the  route  has  been  made, 
data  from  J  to  A  will  traverse  the  path  J— >^1— >^H— >^A.  As  mentioned  above,  within  eaeh  domain, 
ISPs  employ  traffi  e  engineering  meehanisms  to  manage  their  network.  BGP-based  routing  in  the 
Internet  is  also  eommonly  referred  to  as  policy  routing  due  to  the  explieit  support  in  the  protoeol  for 
eommereial  arrangements  between  ISPs. 

BGP  characteristics.  Notice  that  the  ISP-level  route  selected  by  BGP  is  completely  agnostic  to  the 
existence  of  non-access  bottlenecks,  and  the  performance  or  availability  of  the  ISP-level  path  itself. 
The  primary  goal  in  BGP  is  to  provide  course-grained  reachability  information,  while  honoring  the 
economic  considerations  of  intermediate  ISPs.  While  BGP  does  favor  policy-compliant  paths  with 
fewer  ASes  in  them,  this  choice  is  very  static  and  does  not  reflect  the  quality  of  the  path  in  any 
manner.  Also,  the  requirement  for  paths  to  be  policy-compliant  could  arbitrarily  “inflate”  them, 
resulting  in  long,  circuitous  routes. 

Furthermore,  in  BGP,  each  ISP  only  exposes  what  it  believes  to  be  the  single  “best”  route  for 
a  destination  to  each  of  its  neighbors.  Therefore,  an  endpoint  with  a  single  ISP  connection  has  a 
single  path  per  destination  via  its  ISP.  In  short,  the  information  propagated  across  ISPs  in  BGP  is 
heavily  fi  Itered  and  summarized.  This  property,  in  fact,  helps  BGP  to  scale  to  thousands  of  networks. 
However,  it  gives  rise  to  serious  performance  ineffi  ciencies,  discussed  next. 

Problems  with  BGP.  The  above  characteristics  of  BGP  imply  that  if  a  performance  or  availability 
problem  arises  in  one  of  the  ISPs  along  the  path  to  a  destination,  the  end-network  will  either  have  to 
“live  with  it”,  or  wait  until  BGP  selects  an  alternate  path  to  the  destination.  Past  research  studies  have 
shown  that  the  latter  process  may,  in  fact,  take  a  very  long  time.  For  example,  Eabovitz  et  al.  [62] 
study  re-convergence  in  BGP  and  show  that  BGP  may  take  up  to  several  minutes  to  recompute  and 
propagate  new  routes  after  a  failure.  This  implies  that  BGP  lacks  the  necessary  flexibility  to  route 
endpoint  traffi  c  around  network  hotspots  and  failures  in  a  timely  and  effective  manner. 

Several  past  studies  have  quantifi  ed  the  impact  of  this  “rigidity”  in  BGP-based  routing  on  the 
Internet  performance  and  availability  experienced  by  endpoints.  Feamster  et  al.  [39]  show  that 
endpoints  relying  on  BGP-based  routing  may  experience  connectivity  outages  lasting  up  to  sev¬ 
eral  minutes,  or  sometimes  even  hours.  Savage  et  al.  [99]  quantify  the  sub-optimality  in  endpoint 
transfers  speeds  and  Web  response  times  arising  from  relying  on  BGP-based  routes.  The  authors 
show  that  congestion  and  heavy  traffi  c  load  on  BGP  paths  can  be  avoided  by  dynamically  selecting 
alternate,  non-policy-compliant  Internet  paths.  This  could  result  in  substantially  improved  transfer 
speeds  and  response  times  compared  to  traditional  Internet  routing.  Tangmunarunkit  et  al.  [110] 
and  Spring  et  al.  [104]  highlight  another  crucial  ineffi  ciency  in  Internet  routing:  adhering  to  traffi  c 
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exchange  policies  among  ISPs  often  forces  the  paths  between  pairs  of  Internet  endpoints  to  be  close 
to  1.5  times  as  long  as  the  direct,  “as-the-bird-flies”  paths. 

In  short,  these  past  studies  establish  that  BGP  routing  signifi  candy  limits  the  ability  of  well- 
connected  end-networks  to  obtain  good  Internet  performance  and  resilience  from  interruptions.  This 
leads  one  to  ask  if  well-connected  network  endpoints  can  employ  special  mechanisms  that  work  in 
conjunction  with  Internet  routing  to  extract  the  optimal  performance  from  their  high-speed  Internet 
connections. 

1.1.3  The  Problem  and  Our  Approach 

In  this  dissertation,  we  fi  rst  complement  the  above  studies  of  ineffi  ciencies  in  Internet  performance 
and  availability  with  an  investigation  of  traffi  c  load  on  wide-area  links.  We  study  the  extent  to 
which  wide-area  bottleneck  links  limit  the  performance  of  data  transfers  involving  well-connected 
endpoints.  We  fi  nd  that  a  signifi  canf  fracfion  of  non-access  links  carry  high  volumes  of  Iraffi  c, 
leaving  very  liffle  spare  capacify  (under  10Mbps  on  many  occasions).  This,  in  furn,  places  an 
upper  bound  on  fhe  speeds  fhaf  Iransfers  initialed  by  well-connecled  endpoinls  may  alfain.  Since 
BGP  is  ineffeclive  al  enabling  a  quick  recovery  from  fhese  performance  problems,  in  fhis  fhesus, 
we  invesligale  fhe  usefulness  of  mechanisms  fhaf  work  in  conjunclion  wifh  BGP  lo  overcome  fhe 
bolllenecks. 

Specifi  cally,  we  seek  lo  answer  fhe  following  cenlral  queslion: 

What  mechanisms  can  well-connected  end-networks  employ  to  circumvent  performance 
and  availability  problems  in  the  non-access  portion  of  the  Internet,  and  improve  their 
Internet  experience? 

As  menlioned  above,  fhis  issue  has  been  parlly  addressed  by  pasl  research  sludies.  However, 
mosl  of  fhese  approaches  require  supporl  from  eilher  ISPs,  in  lerms  of  culling-edge  rouling  poli¬ 
cies  or  Iraffi  c  engineering  mechanisms,  or  from  special  purpose  Inlernel-wide  infraslrucfures,  lo 
explicilly  bypass  BGP-delermined  routes  and  diverl  end-nelwork  Iraffi  c  from  heavily-loaded  nel- 
work  hol-spols.  In  conlrasl,  our  focus  is  on  mechanisms  fhaf  require  neilher  special  supporl  from 
nelwork-wide  infraslruclure,  nor  any  modifi  cations  lo  BGP  or  lo  common  routing  policies.  Ralher, 
we  sludy  mechanisms  lhal  can  simply  be  deployed  by  endpoinls  al  Iheir  local  nelworks  (e.g.,  al 
enterprise  and  university  access  routers,  or  inside  very  high-speed  home  DSL  gateways)  and  be 
seamlessly  integrated  into  Ihe  existing  wide-area  Inlernel  infraslruclure. 

The  specifi  c  mechanism  we  sludy  is  called  “Mullihoming  Route  Conlrol”  and  involves  Ihe  end- 
nelworking  buying  Inlernel  connections  from  multiple  ISPs  serving  Ihe  city  il  is  located  in.  More 
imporlanlly,  we  allow  Ihe  end-nelwork  Ihe  ability  to  inlelligenlly  schedule  ils  Iransfers  across  Ihe 
ISP  connections  so  as  to  avoid  performance  glilches  in  ISPs,  whenever  Ihey  arise.  The  end-nelwork 
does  nol  conlrol  how  ils  ISPs  route  ils  Iraffi  c;  ralher,  il  can  only  conlrol  which  ISP  carries  Iraffi  c 
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to  a  particular  destination  at  any  time.  The  ISPs  themselves  may  use  BGP  to  route  packets  toward 
their  destinations.  In  this  sense,  multihoming  route  control  does  not  require  any  modifi  cation  to  the 
current  BGP  protocol  or  to  network  infrastructure. 

We  conduct  extensive  active  and  passive  measurements  across  an  68-node  Internet-wide  testbed 
to  quantify  the  performance  benefi  ts  to  end-networks  from  multihoming  route  control.  We  also  use 
these  measurements  to  compare  multihoming  route  control  against  past  proposals  requiring  special- 
purpose  infrastructures,  and  show  that  our  simple  approach,  in  fact,  offers  similar  performance  as 
these  more  powerful  approaches.  We  also  present  an  implementation  of  multihoming  route  control 
for  deployment  in  multihomed  enterprise  settings.  We  show  that  simple  principles,  such  as  regularly 
monitoring  ISP  performance  and  minimally  relying  on  the  historical  performance  of  ISPs,  are  very 
effective  in  helping  endpoints  realize  the  potential  benefi  ts  of  multihoming  in  practice. 

Also,  we  investigate  whether  mechanisms  such  as  multihoming  route  control  will  be  effective 
in  the  future  Internet  at  all,  in  the  face  of  growing  end-user  access  speeds,  higher  traffi  c  loads  and 
shifting  usage  patterns.  We  study  evolutionary  patterns  of  the  Internet’s  ISP  level  interconnection 
and  show  that  the  Internet’s  structure,  together  with  BGP-style  routing,  render  it  fundamentally 
incapable  of  supporting  good  endpoint  performance  in  the  future.  In  light  of  this  observation,  we 
propose  simple  rules  for  altering  interconnections  between  ISPs  to  guarantee  robust  future  endpoint 
performance. 

1.2  An  Integrated  Approach  to  Optimizing  Internet  Performance 

Optimizing  and  improving  Internet  performance  has  been  the  holy  grail  of  the  networking  research 
community  for  well  over  two  decades.  There  have  been  numerous  proposals  for  fi  ne-tuning  Internet 
behavior,  to  bring  Internet  performance  and  reliability  closer  to  that  of  other  engineered  systems 
such  as  cars,  and  the  telephone  network.  However,  researchers  in  the  past  have  approached  the 
Internet  performance  optimization  problem  in  a  piece-meal  fashion,  attacking  specific  facefs  if, 
and  coming  up  wifh  inferesfing  solufions  fo  key  sub-problems.  Examples  of  such  pasf  approaches 
abound:  sfudies  of  routing  pafhologies,  infra  and  infer-domain  fraffi  c  engineering,  proposals  for 
modifying  and  improving  roufing  protocol  behavior,  specifi  cally  BGP,  and  overlay  routing. 

In  confrasf  wifh  fhese  pasf  approaches,  fhis  dissertation  lakes  a  more  inlegraled  approach  to  Ihe 
problem:  we  fi  rsl  idenlify  a  primary  cause  for  poor  endpoinf  performance;  Ihen,  we  propose  and 
evaluate  a  simple  mechanism  fhaf  gives  endpoinfs  fhe  necessary  confrol  to  overcome  performance 
glifches;  furlher,  we  compare  our  proposal  againsf  pasf  well-esfablished  approaches  to  opfimizing 
nelwork  performance  and  show  fhaf  our  proposal  is  bofh  exfremely  simple  and  as  effeclive  as  more 
complex  approaches;  fi  nally,  we  also  look  info  fhe  fulure  and  ask  if  any  of  fhe  exisling  proposals,  be 
if  mulfihoming  or  any  ofher  scheme,  will  be  effeclive  when  fhe  nelwork  grows  in  size  and  usage. 

We  believe  fhaf  faking  a  “syslemic”  view  of  fhe  problem,  and  analyzing  various  differenl  as- 
pecls  of  if,  lends  our  observalions  and  mechanisms  longer  “shelf  life”.  We  feel  fhaf,  owing  to 


our  approach,  the  key  lessons  from  this  thesis — i.e.,  encouraging  richer  fi  rst  hop  connectivity;  en¬ 
abling  endpoint  control  over  routing  for  performance;  and  leaving  the  rest  of  the  network  simple 
and  untouched — will  continue  to  apply  to  future  Internet  architectures,  as  well  as  to  other  problem 
domains  such  as  wireless  home  networking. 

1.3  Dissertation  Outline 

The  rest  of  this  dissertation  is  organized  as  follows.  In  Chapter  2,  we  go  over  past  approaches 
for  optimizing  end-to-end  performance  in  the  Internet.  Specifi  cally,  we  discuss  three  classes  of 
mechanisms — ISP-based,  third-party  infrastructure-based  and  end  point-based — and  put  the  contri¬ 
butions  of  this  dissertation  in  perspective  of  these  earlier  approaches.  In  Chapter  3,  we  present  a 
large-scale  measurement  study  of  performance  bottlenecks  in  the  wide-area  network.  We  present  a 
classifi  cation  of  these  bottlenecks  in  terms  of  their  location  and  latency,  and  discuss  their  impact  on 
end-to-end  performance.  Building  on  these  observations,  in  Chapter  4,  we  show  how  end-networks 
can  employ  multihoming  route  control  to  overcome  the  bottlenecks  and  improve  their  Internet  per¬ 
formance.  We  also  investigate  how  many,  and  which,  ISPs  endpoints  must  employ  to  realize  optimal 
performance  improvements.  Further,  in  Chapter  5,  we  contrast  the  performance  improvements  from 
route  control  against  past  third-party  infrastructure-based  approaches.  Specifi  cally,  we  ask  whether 
the  limited  routing  flexibility  of  multihoming  can  provide  comparable  benefi  ts  as  the  arbitrary  flex¬ 
ibility  of  the  past  approaches.  Next,  in  Chapter  6,  we  ask  whether  the  potential  benefi  ts  from  route 
control  can  indeed  be  realized  in  practical  deployment  scenarios.  We  implement  and  evaluate  several 
route  control  mechanisms  and  outline  important  “best  common”  route  control  practices.  In  Chap¬ 
ter  7,  we  ask  whether  techniques  such  as  route  control  will  continue  to  provide  good  performance  in 
the  future  network.  Specifi  cally,  we  investigate  the  impact  of  certain  growth  and  operational  trends 
of  the  Internet  on  its  ability  to  support  good  performance  in  the  face  of  growing  end-user  access 
speeds  and  new  user  applications.  Finally,  in  Chapter  8,  we  present  our  conclusions  and  discuss 
issues  for  further  study. 
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Chapter  2 


Background  and  Approach 


Optimizing  and  improving  Internet  performanee  and  availability  has  been  a  hot  topie  of  researeh 
for  several  years.  Several  proposals  have  been  made  for  improving  endpoint  network  staeks,  sueh 
as  TCP  buffer-size  tuning  [57],  better  eongestion  eontrol  algorithms  [11]  and  improved  loss  reeov- 
ery  meehanisms  [71,  25].  Approaehes  for  improving  applieation-level  performanee,  speeifi  eally, 
the  performanee  of  Internet  Web  servers  are  also  well-studied  (for  example,  better  strategies  for 
aeeepting  ineoming  eonneetions  [26],  improving  Web  response  time  using  eaehing  [54]  and  elient 
eharaeterization-based  Web  server  adaption  [61]). 

Our  foeus  in  this  dissertation  is  on  non-aeeess  performanee  bottleneeks  and  approaehes  to  op¬ 
timizing  Internet  performanee  in  faee  of  sueh  bottleneeks.  Sueh  approaehes  have  been  eonsidered 
in  both  the  researeh  and  eommereial  worlds,  and  some  are  even  employed  widely  today.  In  general, 
these  proposals  ean  be  elassifi  ed  into  three  broad  eategories: 

1.  ISP-based:  These  are  meehanisms  adopted  by  ISPs  to  eater  to  traffie  traversing  their  net¬ 
works.  Examples  inelude  intra-domain  and  inter-domain  traffic  engineering  mechanisms. 

A  special  class  of  mechanisms  in  this  category  includes  approaches  for  performance-aware 
wide-area  Internet  routing.  These  special  mechanisms  often  require  global  coordination 
across  Internet  ISPs. 

2.  Internet- wide  infrastructure-based:  These  approaches  involve  the  deployment  of  an  application- 
level  infrastructure  across  several  ISPs.  Such  an  infrastructure  is  typically  operated  and  man¬ 
aged  by  a  commercial  third-party  service  provider.  Examples  include  overlay  network-based 
mechanisms.  Multiple  competing  overlay  service  providers  may  deploy  individual  overlay  in¬ 
frastructures.  The  main  purpose  of  an  overlay  infrastructure  is  to  explicitly  divert  subscriber 
traffi  c  from  congested  wide-area  links  or  unavailable  routes. 

3.  End  point-based:  These  mechanisms  are  the  simplest  of  the  three.  They  require  the  sub¬ 
scriber  network  to  buy  Internet  connectivity  from  2-3  ISPs  serving  its  city.  These  approaches 
also  require  deployment  of  special  devices  at  the  subscriber  networks.  However,  these  mech¬ 
anisms  do  not  require  any  further  support  from  the  ISPs  themselves,  or  from  special  Internet- 
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Example 

Optimizes 

end-to-end 

performance? 

Who  deploys/ 
manages? 

Fine-grained 

optimization? 

BGP-compliant? 

Deployment  ease? 

ISP-based 

Inter-domain 

traffi  c  engineering 

No 

Single  ISP  or 
neighboring  ISPs 

No 

Yes 

Need  ISP 
co-operation 

Internet-wide 

infrastructure-based 

Overlay  routing 

Yes 

Third-party  provider 

Yes 

No 

Need  third- 

party  deployment 

Endpoint-based 

Multihoming  route 

control 

Yes 

Endpoint 

Yes 

Yes 

Endpoint  deploys 
unilaterally 

Table  2.1:  Approaches  to  optimizing  Internet  performance:  This  table  shows  the  distinguishing  features  of  com¬ 
mon  approaches  to  circumventing  network-level  performance  hottlenecks. 


wide  infrastructures.  Moreover,  they  do  not  require  any  modifi  cation  to  the  Internet’s  wide- 
area  protocol  (i.e.,  BGP).  Examples  include  route  control  appliances. 

The  distinguishing  features  of  the  above  approaches  are  summarized  in  Table  2. 1 .  We  discuss 
these  approaches  in  further  detail  next. 

2.1  ISP  Traffi  c  Engineering 

Approaches  for  optimizing  the  performance  of  ISP  networks,  commonly  referred  to  as  intra-domain 
traffic  engineering,  have  been  around  for  a  very  long  time.  For  example,  the  ARPANET  of  the  1970s 
and  1980s  employed  adaptive  routing  across  the  network.  In  this  setting,  network  routers  in  the  net¬ 
work  made  routing  choices  based  on  the  current  state  of  the  network.  ARPANET  routers  maintained 
information  about  current  (estimated)  delays  to  other  routers  and  forwarded  packets  along  the  path 
with  the  minimum  estimated  delay  [72].  Further  modifi  cations  were  made  to  the  routing  protocol, 
for  example,  by  incorporating  a  new  routing  metric  which  was  based  on  a  joint  function  of  both 
hop  counts  and  network  delays,  so  as  to  minimize  traffi  c  flucfuafions  due  fo  variations  in  nefwork 
delays  [58].  The  key  drawback  of  fhese  “load-sensitive”  approaches  fo  fraffi  c  engineering  was  fheir 
reliance  on  fhe  availabilify  of  fresh  informalion  abouf  nefwork  sfafe  fo  compufe  nefwork  roufes:  sfale 
nefwork  sfafe  could  cause  all  nefwork  roufers  fo  avoid  a  common  sef  of  congesfed  links  leading  fo 
oscillations  and  congestion  on  ofher  nefwork  links. 

Modern  day  approaches  fo  fraffi  c  engineering  wifhin  ISP  nefworks  freaf  fhe  problem  as  one  of 
“nefwork  managemenf”:  Nefwork  roufers  simply  compufe  roufes  based  on  sfafic  link  weighfs.  The 
ISP  nefwork  managemenf  sysfem,  in  furn,  sefs  fhe  paramefers  based  on  a  nefwork-wide  view  of 
fraffi  c  and  topology.  To  be  precise,  fhe  sfafic  link  weighfs  are  fi  rsf  sef  fo  be  direcfly  proporfional 
fo  fhe  disfance  befween  roufers  and  inversely  proporfional  fo  fhe  capacify  of  fhe  link  befween  fhe 
roufers.  The  sfafic  weighfs  are  fi  ne-funed  by  fhe  nefwork  managemenf  sysfem  based  on  esfimafed 
nefwork  conditions,  fo  direcfly  minimize  mefrics  such  as  fhe  maximum  link  ufilizafion. 

To  frack  nefwork  fraffi  c,  an  ISP  musf  fi  rsf  keep  esfimafes  of  fhe  fraffi  c  fraversing  ifs  nefwork. 
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In  general,  a  traffic  matrix  deseribes  the  volume  of  data  transmitted  between  a  pair  of  routers  in 
an  ISP’s  network.  Despite  the  eentrality  of  traffi  e  matriees  to  traffi  e  engineering,  today’s  routers 
provide  little  support  for  measuring  them  aeeurately.  As  sueh,  the  state-of-the-art  in  teehniques  for 
traffi  e  matrix  estimation  is  essentially  to  derive  the  underlying  traffi  e  mafrix  from  measuremenfs  of 
link  loads  (obfained  from  Simple  Nefwork  Managemenf  profoeol  (SNMP))  and  ISP  routing  eonfi  gu- 
rafions.  However,  sinee  fhere  are  far  fewer  routers  and  links  in  an  ISP  nefwork  fhan  roufer-pairs,  fhe 
esfimafes  of  fhe  fraffi  e  mafrix  obfained  in  fhe  above  manner  are  highly  error-prone.  The  high  vari¬ 
ability  of  fraffi  e  aeross  ISP  nefworks  also  eonfribufes  fo  fhe  inaeeuraey  in  fraffi  e  mafrix  esfimafes.  A 
eomprehensive  survey  of  fraffi  e  mafrix  esfimafion  teehniques  and  related  fraffi  e  engineering  meeh- 
anisms  may  be  found  in  [47,  96,  52]. 

Upon  esfimafing  fhe  fraffi  e  mafrix,  an  ISP  employs  fhe  esfimafes  fo  sef  links  weighfs  and  route 
fraffi  e  aeross  ifs  nefwork  so  as  fo  optimize  fhe  usage  of  ifs  nefwork  links  and  routers.  The  routing 
weighfs  may  also  be  sef  fo  satisfy  performanee  requiremenfs  of  seleef  subseribers.  ISPs  fypieally 
eonfi  gure  link  weighfs  on  fhe  nefworks  on  a  eoarse  fime-seale:  For  example,  a  large  baekbone  ISP 
may  sehedule  daily  mainfenanee  of  ifs  nefwork  befween  midnighf  and  4  AM,  when  fhe  new  links 
weighfs  may  be  eompufed  and  re-opfimized  fo  improve  nefwork  eongesfion.  Defails  of  infra-domain 
fraffi  e  engineering  meehanisms,  proposals  and  extensions  may  be  found  in  [18,  19,  16,  120]. 

In  inter-domain  traffic  engineering,  similar  feehniques  as  above  are  employed  by  fwo  neigh¬ 
boring  ISPs  fo  manage  fhe  load  on  fhe  links  fhey  share  (i.e.,  peering  links).  This  has  only  reeenfly 
reeeived  fhe  aflenlion  of  fhe  researeh  and  nefwork  operafor  eommunify  (see  [90,  40]  for  eomplefe 
defails).  The  key  ehallenge  in  infer-domain  fraffi  e  engineering  is  fo  ensure  effeetive  infer-domain 
fraffi  e  managemenf  wifhouf  revealing  infernal  nefwork  information  fo  neighboring  nefworks.  ISPs 
oflen  hold  fheir  internal  roufing  meehanisms  and  topology  saerosanef,  and  a  poorly  eonfi  gured  fraf¬ 
fi  e  engineering  meehanism  mighf  reveal  erueial  hinfs  regarding  fhe  infernal  polieies  of  an  ISP  to  ifs 
neighbor.  Also,  while  infra-domain  fraffi  e  engineering  primarily  deal!  wifh  seleefing  pafhs  based 
on  link  mefries,  in  infer-domain  fraffi  e  engineering  an  ISP  will  have  fo  eonsider  eommereial  ar- 
rangemenfs  wifh  all  ifs  neighboring  ISPs,  along  wifh  link  mefries,  fo  determine  fhe  besf  pafhs. 
Moreover,  effeetive  infer-domain  fraffi  e  engineering  is  made  all  fhe  more  ehallenging  by  fhe  fael 
fhaf  fhe  Infernef’s  wide-area  routing  profoeol,  BGP,  does  nof  eonvey  explieif  informalion  abouf  fhe 
quality  of  pafhs.  Therefore,  mosf  existing  approaehes  fo  infer-domain  fraffi  e  engineering  explore 
minimal  modifi  eafions  to  BGP  fo  enable  supporf  for  infer-domain  fraffi  e  engineering.  Due  fo  fhese 
eonsfrainfs,  mosf  ISPs  are  yef  to  adopf  fhe  lafesf  proposals  for  infer-domain  fraffi  e  engineering. 

More  imporfanfly,  bofh  fhe  afore-mentioned  ISP-based  meehanisms — infra-  and  infer-domain 
fraffi  e  engineering — are  ineffeefive  af  improving  fhe  end-fo-end  Infernef  performanee  of  endpoinfs 
due  fo  fwo  additional  reasons.  Firsf,  fhese  approaehes  rely  on  fraffi  e  mafrix  esfimafion  whieh,  as 
mentioned  earlier,  is  error-prone,  and  usually  eonduefed  on  a  daily  fime-seale.  Therefore,  fhese 
feehniques  may  nof  be  responsive  enough,  from  fhe  perspeefive  of  endpoinfs,  fo  overeome  unex- 
pecfed  performanee  or  availabilify  problems. 
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Figure  2.1:  An  end-to-end  Internet  fbw:  A  typical  Internet  fbw  may  traverse  several  ISPs  end-to-end. 

Second,  these  approaches  are  targeted  at  traffi  c  confi  ned  to  a  single  ISP  or  pairs  of  neighboring 
ISPs.  In  practice,  traffi  c  in  the  Internet  could  traverse  multiple  ISPs  and  peering  locations,  end- 
to-end.  An  example  of  a  typical  Internet  flow  is  shown  in  Figure  2.1.  The  flow  traverses  multiple 
ISPs  along  its  path,  and  distinct  pairs  of  neighboring  ISPs  may  have  very  different  commercial 
arrangements,  making  coordinated  traffi  c  exchange  between  ISPs  very  challenging.  The  “local” 
performance  optimizations  performed  by  individual  ISPs,  therefore,  may  not  improve  the  “global” 
end-to-end  performance  of  Internet  traffi  c.  Recent  research  studies  have  proposed  modifi  cations 
to  BGP  that  make  cooperative  traffi  c  exchange  across  multiple  ISPs  easier  than  it  is  today.  For 
example,  the  NIRA  system  [117]  advocates  incorporating  support  for  AS-level  source-routing  into 
BGP.  This  gives  both  end-networks  and  ISPs  greater  control  over  the  exact  sequence  of  ISPs  tra¬ 
versed  by  their  data  traffi  c.  In  turn,  this  feature  makes  it  easier  to  confi  gure  new,  inter-domain 
traffi  c  engineering-friendly  routing  policies.  However,  since  approaches  such  as  NIRA  require  ma¬ 
jor  changes  to  BGP,  it  may  be  several  years  before  they  are  widely  adopted. 

2.2  Overlay  Routing 

As  mentioned  in  Chapter  1,  Internet  routing  in  itself  cannot  support  good  end-to-end  performance 
and  resilience.  Also,  ISP-based  traffi  c  engineering  mechanisms  are  not  effective  either,  as  men¬ 
tioned  in  the  previous  section.  To  address  these  ineffi  ciencies,  and  to  guarantee  good  end-to-end 
Internet  performance,  several  researchers  have  advocated  deploying  special  purpose  Internet-wide 
infrastructures  to  provide  support  for  explicitly  diverting  subscriber  traffi  c  from  highly  congested 
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Figure  2.2:  Overlay  routing:  This  fi  gure  illustrates  an  overlay  rooting  scenario. 

network  locations.  The  special-purpose  infrastructures  are  called  Overlays  and  these  techniques  are 
broadly  referred  to  as  Overlay  routing  mechanisms. 

In  overlay  routing,  a  third-party  overlay  service  provider  deploys  a  large  collection  of  overlay 
nodes  in  major  cities  across  several  ISPs  throughout  the  Internet.  These  overlay  nodes  regularly 
monitor  the  performance  and  availability  of  paths  between  themselves  and  to  various  Internet  des¬ 
tinations  (using  active  probes,  such  as  ICMP  Pings,  or  passive  measurements  of  existing  data 
transfers  between  the  nodes).  The  instantaneous  performance  and  availability  information  is,  in 
turn,  accessible  to  the  subscribers  of  the  overlay  service.  The  subscriber  can  than  employ  this 
information  to  obtain  better  Internet  performance,  as  follows:  Whenever  the  subscriber’s  default 
Internet  route  to  a  destination  (as  determined  by  BGP)  does  not  offer  the  expected  performance  or 
availability,  the  subscriber  can  immediately  route  its  traffi  c  via  an  alternate  path  using  the  overlay 
nodes.  This  is  shown  in  Figure  2.2.  The  subscriber,  in  effect,  hands  its  traffi  c  over  to  the  overlay 
network,  which  routes  the  traffi  c  across  select  overlay  nodes,  in  a  hop-by-hop  manner,  toward  the 
destination.  Notice  that  the  route  traversed  by  the  subscriber  traffi  c  between  neighboring  overlay 
nodes  in  an  overlay  path  must  still  conform  to  BGP  policies.  Therefore,  the  overlay  nodes,  by  form¬ 
ing  an  application-level  network  among  themselves,  “stitch”  multiple  such  BGP  routes  on-the-fly 
to  yield  better  performing  alternative  paths  to  their  subscribers.  Notice  also  that  overlays  provide 
end-networks  a  great  deal  of  flexibility  in  selecting  quite  arbitrary  paths  through  the  Internet. 
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Two  famous  research  studies  advocating  overlay  routing  are  the  Detour  system  (see  [99])  and  the 
Resilient  Overlay  Network  (RON)  testbed  (see  [14]).  In  the  Detour  study,  Savage  et.  al  [99]  study 
the  impact  of  the  ineffi  ciencies  of  wide-area  routing  on  end-to-end  performance  in  terms  of  round- 
trip  time  (RTT),  loss  rate,  and  throughput.  Using  observations  drawn  from  active  measurements 
between  public  traceroute  server  nodes,  they  compare  the  performance  on  default  Internet  (BGP) 
paths  with  the  potential  performance  from  using  alternate  overlay  paths.  This  work  shows  that 
for  a  large  fraction  of  BGP  paths  measured,  there  are  alternate  overlay  paths  offering  much  better 
throughputs,  RTTs,  and  loss  rates.  Andersen  et.  al  advocate  deploying  RONs  to  address  problems 
with  BGP’s  fault  recovery  times,  which  have  been  shown  to  be  on  the  order  of  tens  of  minutes  in 
some  cases  [14].  RON  nodes  regularly  monitor  the  availability  of  paths  between  each  other,  and 
use  this  information  to  dynamically  select  direct  or  indirect  end-to-end  paths.  RON  mechanisms  are 
shown  to  signifi  candy  improve  the  availability  of  end-to-end  paths  between  the  overlay  nodes,  and 
to  bring  failure  recovery  times  to  within  a  few  seconds. 

In  addition,  a  few  commercial  instantiations  of  overlays  for  improving  web  download  perfor¬ 
mance  have  been  proposed  recently,  such  as  Akamai  SureRoute  [4].  At  regular  intervals  (called 
“sampling  intervals”),  the  Akamai  SureRoute  system  [4]  replicates  client  data  transfers  across  mul¬ 
tiple  overlay  paths.  The  systems  then  selects  the  overlay  path  offering  the  fastest  performance  to  the 
client,  and  employs  the  path  for  all  subsequent  transfers  involving  the  client  until  the  next  sampling 
interval.  SureRoute  is  offered  as  a  resilience-enhancing  service  to  both  Web  content  providers,  as 
well  as  large  enterprise  customers. 

A  natural  concern  regarding  overlays  is  that,  with  wide-spread  deployment  and  usage,  the  vast 
dexibility  provided  by  overlays  may  actually  result  in  worse  network-wide  congestion  and  oscil¬ 
lations  in  trafdc.  Indeed,  past  theoretical  studies  have  highlighted  examples  of  scenarios  where 
allowing  end-networks  arbitrary  dexibility  in  routing  could  substantially  worsen  the  average  per¬ 
formance  of  the  network  [97,  59].  For  Internet-like  scenarios,  however,  researchers  have  shown 
that  wide-spread  deployment  of  overlay  routing  does  not  necessarily  result  in  worse  overall  perfor¬ 
mance.  In  fact,  the  average  end-user  performance  with  overlay  routing  is  signid  candy  better  that 
that  achieved  with  traditional  routing  protocols  [89].  To  summarize,  there  are  few  hurdles  to  the 
wide-spread  deployment  of  overlay  networks.  However,  very  few  end-networks  today  subscribe  to 
overlays  for  optimizing  their  Internet  performance  [24].  Further,  most  of  these  end-networks  em¬ 
ploy  overlays  for  a  small  but  important  fraction  of  their  transfers,  such  as  mission  critical  d  nancial 
transactions. 

We  now  discuss  another  key  drawback  of  overlay-based  approaches.  A  common  feature  of 
overlay-based  approaches  is  that  although  the  path  between  neighboring  overlay  nodes  is  BGP- 
compliant,  the  end-to-end  path  across  the  overlay  network  could  arbitrarily  violate  commercial  poli¬ 
cies  between  ISPs.  Imagine,  for  example,  an  overlay  path  consisting  of  the  following  sequence  of 
ISPs:  A  ^  B  ^  C  (with  overlay  nodes  placed  in  ISPs  A,  B  and  C).  Assume  further  that  ISP  B 
is  a  customer  of  ISPs  A  and  C  (e.g.,  A  and  C  may  be  much  larger  ISPs,  with  wider  global  reach. 
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and  B  may  be  a  smaller  regional  ISP;  Conneeting  to  both  A  and  C  enables  B  to  reaeh  a  wide  range 
of  Internet  destinations).  BGP’s  routing  polieies  dietate  that  a  eustomer  network  shall  not  provide 
transit  for  traffi  e  from  one  provider  destined  to  another  provider  network  (this  is  elearly  not  in  the 
best  interest  of  the  eustomer,  sinee  he  would  be  paying  both  his  providers  for  provider  transit  to 
their  traffi  e!).  It  is  easy  to  see  that  the  above  overlay  path  direetly  violates  this  poliey.  In  sueh 
eases  of  violation,  the  overlay  serviee  provider  eompensates  for  ISP  B’s  loss  of  revenue  via  explieit 
payment.  In  turn,  this  inereases  subseription  and  usage  eosts  for  an  overlay  elient,  espeeially  if  the 
elient  network  uses  the  overlay  for  regular,  high-volume  Web  aeeess. 

More  importantly,  however,  it  is  unelear  if  the  high  degree  of  flexibility  provided  by  overlay 
networks  to  overeome  BGP-driven  routes  and  polieies  is,  in  faet,  necessary  to  help  end  network 
obtained  improved  Internet  performanee.  Can  end-networks  make  do  with  mueh  lesser  routing  flex¬ 
ibility  and  still  aehieve  reasonable  levels  of  Internet  performanee?  Furthermore,  do  end-networks 
require  support  from  speeial-purpose  infrastrueture  deployed  in  the  wide-area  Internet?  This  disser¬ 
tation  provides  direet  answers  to  these  questions,  as  we  explain  shortly. 

2.3  Commercial  Route  Control  Products 

A  number  of  vendors  have  reeently  developed  “intelligent”  end-network  routing  applianees  or  “route 
eontrol”  meehanisms.  In  eontrast  with  overlay-based  meehanisms  whieh  require  an  exelusive  in¬ 
frastrueture  to  enable  alternate  routes,  these  routing  applianees  are  purely  end  point-based.  As  sueh 
they  are  mueh  easier  to  deploy  and  use.  However,  this  very  advantage  also  limits  the  amount  routing 
flexibility  these  deviees  ean  provide  in  eomparison  with  overlay-based  approaehes. 

The  basie  premise  in  these  approaehes  is  that  an  end-network  has  Internet  eonneetions  to  mul¬ 
tiple  ISPs.  This  praetiee  is  referred  to  as  multihoming.  Given  the  multiple  ISP  eonneetions,  these 
eommereial  produets  allow  a  multihomed  end-network  to  dynamieally  sehedule  their  Internet  traffi  e 
aeross  its  upstream  ISP  links  for  optimizing  performanee  or  availability.  These  deviees  are  usually 
eo-loeated  with  the  end-networks  edge  routers,  as  shown  in  Figure  2.3.  The  produets  work  by  prob¬ 
ing  the  performanee  of  the  end-network’s  ISP  links  to  various  destinations  and  seheduling  traffi  e 
aeross  the  ISP  that  is  most  likely  to  offer  the  best  performanee  or  availability  to  a  partieular  des¬ 
tination.  Therefore,  route  eontrol  applianees  ean  help  subseriber  networks  avoid  performanee  and 
availability  problems  in  their  ISPs’  paths  by  dynamieally  routing  via  alternate  ISPs. 

Notiee  that  eaeh  ISP  provides  the  end-network  with  exaetly  one  BGP  path  per  destination,  as 
explained  in  Chapter  1.  In  effeet,  with  three  ISP  eonneetions,  a  multihomed  end-network  has  three 
paths  per  destination.  Therefore,  route  eontrol  produets  help  end-networks  to  make  a  elever  ehoiee 
of  whieh  ISP  link,  or,  equivalently,  whieh  BGP  path,  to  employ  for  obtaining  the  best  possible 
performanee  or  availability.  While  route  eontrol  produets  provide  more  routing  flexibility  than  tra¬ 
ditional  singly-homed  BGP-based  routing,  this  flexibility  is  nowhere  near  that  offered  by  overlay 
networks.  In  this  dissertation,  we  eompare  the  relative  trade-offs  of  the  extents  of  flexibility  pro- 
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Figure  2.3:  Multihoming  route  control:  This  fi  gure  illustrates  a  route  control  scenario  for  an  end-network  with 
three  ISP  connections. 

vided  by  the  two  systems.  We  also  study  how  to  effectively  utilize  the  modest  amount  of  flexibility 
provided  by  route  control  to  effectively  improve  Internet  access  performance. 

While  optimizing  performance  and  availability  (e.g.  for  Web  transactions)  is  a  primary  design 
goal  of  these  devices,  they  often  provide  other  important  benefi  ts.  For  example,  several  route  con¬ 
trol  devices  allow  end-networks  to  tune  their  traffi  c  scheduling  policies  to  obey  specifi  c  bandwidth 
usage  limits.  Such  appliances  typically  provide  subscribers  with  a  “knob”  to  trade-off  improved  per¬ 
formance  for  lower  bandwidth  usage  costs  (see,  for  example,  [98,  103,  107]).  Other  appliances  are 
targeted  at  balancing  load  across  multiple,  medium  to  low-speed  ISP  connections  (e.g.  multiple  DSL 
links  at  home)  to  optimize  Web  download  speeds  [35,  94,  79].  A  few  others  are  specifi  cally  targeted 
at  improving  the  responsiveness  of  delay-sensitive  Internet  applications  such  as  VoIP  [53,  111].  A 
detailed  study  of  alternate  applications  of  route  control  is  beyond  the  scope  of  this  dissertation. 

2.4  An  Overview  of  Our  Approach 

Most  of  what  we  know  about  the  performance  bottlenecks  experienced  by  well-connected  endpoints 
is  driven  by  word-of-mouth.  The  common  perception  is  that  wide-area  performance  problems  are 
confi  ned  to  the  boundaries  between  neighboring  ISPs  in  the  Internet,  i.e.,  peering  locations.  Since 
no  ISP  has  suffi  cient  incentive  to  manage  traffi  c  at  these  locations,  peering  points  are  presumed  to 
carry  high-levels  of  traffi  c  resulting  in  ineffi  cient  end-to-end  performance. 
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step  #1: 

Performance  bottlenecks  in  the  current  Internet  [9] 
Chapter  3 


Step  #2: 

Multihoming  route  control  for  avoiding  performance  bottlenecks  [7] 

Chapter  4 


Step  #3: 

Overlay  routing  versus  multihoming  route  control  [8] 
Chapter  5 


Step  #4: 

Practical  multihoming  route  control  strategies  [10] 
Chapter  6 


Step  #5: 

Performance  scaling  in  the  future  Internet  [5] 
Chapter  7 


Table  2.2:  Overview  of  our  approach:  The  five  key  problems  central  to  this  dissertation.  We  also  show  provide 
citations  for  the  preliminary  conference  versions  of  the  corresponding  resnlts. 


The  fi  rst  issue  we  tackle  in  this  dissertation  is  to  investigate  the  verity  behind  this  popular  belief 
(Step  #1  in  Figure  2.2).  Specifi  cally,  we  seek  to  not  only  quantify  the  levels  of  traffi  c  at  peering 
locations,  but  also  expose  other  locations  of  the  wide-area  Internet  which  could  potentially  limit 
the  performance  of  well-connected  endpoints.  Apart  from  their  location,  we  derive  several  other 
characteristics  of  the  bottlenecks  links,  such  as  their  latencies  and  available  capacity. 

We  then  explore  techniques  that  well-connected  end-networks  may  employ  to  overcome  these 
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performance  problems.  While  many  subscriber  networks  are  beginning  to  employ  route  control 
appliances  for  this  purpose,  very  little  is  known  about  the  tangible  benefi  ts  of  these  products.  Sub¬ 
scriber  networks  often  have  to  rely  on  word-of-mouth  before  deciding  whether  or  not  to  employ 
such  mechanisms.  A  moderately  useful  source  of  information  for  such  subscribers  is  the  accompa¬ 
nying  set  of  “white  papers”  that  outline  the  distinguishing  features  of  the  products.  However,  such 
documents  merely  help  subscriber  networks  to  select  between  products  on  the  basis  of  which  fea¬ 
tures  are  supported  or  not.  The  subscribers,  in  turn,  will  have  to  “try  out”  the  appliances  in  order  to 
understand  their  relative  merits.  Even  in  such  situations,  it  is  impossible  for  the  subscribers  to  know 
if  the  Internet  performance  they  observe  from  a  specifi  c  route  control  appliance  can  be  improved 
even  further.  The  exact  improvements  from  route  control  are  also  tightly  coupled  with  the  ISPs  that 
the  subscriber  network  connects  to.  For  most  end-networks,  connection  cost,  and  in  some  cases,  the 
service  level  agreements  (SLAs)  offered  by  ISPs  are  the  primary  yard-sticks  in  determining  whom 
to  buy  connectivity  from.  However,  there  is  very  little  guidance  available  to  subscriber  networks  in 
selecting  ISPs  that  offer  the  expected  level  of  performance,  while  also  charging  reasonable  connec¬ 
tivity  fees. 

In  this  work,  we  seek  to  establish  a  base-line  for  the  expected  performance  and  resilience  im¬ 
provements  from  multihoming  route  control  mechanisms  (Step  #2  in  Figure  2.2).  Further,  we  aim 
to  investigate  how  end-networks  should  select  their  ISPs  to  realize  the  potential  improvements  from 
route  control.  To  realize  this  goal,  we  collect  performance  and  availability  data  over  selected  nodes 
in  the  server  infrastructure  of  a  large  content  distribution  network  operated  by  Akamai  Technolo¬ 
gies  [2]:  we  call  this  collection  of  nodes  our  “measurement  testbed”.  The  measurement  samples  are 
obtained  at  small  time  intervals  over  several  week-long  intervals  of  time. 

Apart  from  helping  us  understand  the  tangible  benefi  ts  from  route  control,  this  approach  offers 
three  other  important  advantages: 

1.  Since  Akamai  has  servers  deployed  at  major  cities  throughout  the  world,  our  measurement 
study  can  help  make  observations  about  the  improvements  from  route  control  across  several 
of  geographic  regions  of  the  world. 

2.  Akamai’s  server  deployment  style  (i.e.,  several  servers  deployed  across  many  different  ISPs 
in  major  cities),  helps  us  understand  the  impact  of  the  number  and  the  exact  choice  of  ISPs 
on  the  expected  performance  improvements. 

3.  Since  our  measurements  span  long  periods  of  time,  we  can  derive  hourly  and  diurnal  patterns 
in  the  performance  improvements  from  route  control.  This  information  can  be  employed  to 
fi  ne-tune  the  operation  of  route  control  appliances. 

We  also  use  the  measurements  above  to  contrast  the  improvements  from  route  control  with 
overlay  routing-based  mechanisms  (Step  #3  in  Figure  2.2).  Commercial  overlay-based  mechanisms 
offer  end-networks  a  viable  option  for  obtaining  better  Internet  performance  and  resilience.  Hence 
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our  comparison  seeks  to  understand  if  there  are  clear  benefi  ts  from  overlay-based  mechanisms  that 
route  control  simply  cannot  provide.  To  evaluate  the  benefi  ts  of  overlay  routing,  we  employ  the 
nodes  in  our  testbed  also  as  intermediate  overlay  hops.  While  our  overlay  testbed  is  not  nearly  of 
the  same  size  as  a  commercial  overlay  network,  it  is  nevertheless  larger  than  other  overlay  testbeds 
employed  in  past  research  studies.  We  also  evaluate  a  powerful  form  of  overlay  routing  which 
combines  the  first-hop  route  choices  that  multihoming  provides  with  the  flexibility  provided  by 
overlay  routing  beyond  the  fi  rst  hop  ISP(s).  The  routes  enabled  by  this  form  of  overlay  routing 
subsume  the  routes  enabled  by  route  control,  and  as  such  route  control  should  offer  strictly  inferior 
performance.  Our  goal,  then,  is  to  understand  how  inferior  route  control  really  is. 

Following  the  comparison  of  overlay  routing  and  multihoming  route  control,  we  present  a  study 
of  the  effectiveness  of  simple  route  control  strategies  (Step  #3  in  Figure  2.2).  Most  route  control 
white  papers  outline  a  few  techniques  that  are  believes  to  be  key  to  the  effectiveness  of  the  respec¬ 
tive  products.  We  implement  a  few  of  these  techniques  and  propose  several  new  alternatives.  In 
our  evaluation,  we  highlight  important  drawbacks  of  existing  approaches.  We  also  seek  to  outline 
several  other  best  common  practices  for  the  design  and  operation  of  route  control  appliances.  The 
primary  goal  of  our  implementation  effort,  however,  is  to  show  that  the  simple  route  control  schemes 
we  develop  can  help  end-networks  signifi  cantly  improve  their  Internet  performance  in  practical  de¬ 
ployment  scenarios.  A  secondary  goal  is  to  show  that  the  performance  from  these  techniques  is  also 
reasonably  close  to  the  maximum  benefi  ts  from  route  control. 

While  the  fi  rsf  fhree  sfeps  in  fhis  disserfafion  deal  wifh  performance  improvemenfs  in  Infernef 
today.  Step  #4  (See  Figure  2.2)  asks  whefher  good  Infernef  performance  can  be  susfained  in  fhe 
fufure  Infernef.  In  a  few  years’  time,  fhe  Infernef  will  grow  signifi  canfly  in  size.  Furfhermore, 
endpoinf  access  speeds  will  improve  by  orders  of  magnifude.  New  killer  applicafions  will  drasfically 
alfer  fhe  fraffi  c  mix  on  fhe  nefwork.  Our  goal,  fhen,  is  to  undersfand  if  fhe  Infernef  is  fundamenfally 
capable  of  accommodafing  fhese  fufure  changes.  If  fhis  is  indeed  fhe  case,  fhen  we  can  expecf 
fhaf  fhe  techniques  proposed  for  route  confrol  in  fhis  fhesis,  and  ofher  related  mechanisms,  will 
continue  to  remain  effective  af  exfracfing  good  Infernef  performance  in  fhe  fufure.  However,  if  is 
possible  fhaf  fundamenfal  properties  of  fhe  design  and  evolufion  of  fhe  Infernef  may  prevenf  fhis 
from  happening.  Specifi  cally,  we  investigate  ask  whefher  fhe  Infemef’s  inferconnecfion  topology 
and  roufing  profocols  fogefher  can  limif  fhe  nefwork’ s  abilify  to  accommodate  key  changes  over 
time.  We  also  idenfify  guiding  principles  for  fhe  design  and  evolufion  of  fhe  nefwork  fhaf  can 
improve  ifs  abilify  to  supporf  robusf  fufure  performance. 
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Chapter  3 


An  Empirical  Evaluation  of  Wide-Area  Internet  Bottlenecks 


It  is  widely  believed  that  poor  Internet  performanee  arises  primarily  from  eonstraints  at  the  edges 
or  the  access  portions  of  the  network.  These  narrow-band  aeeess  links  (e.g.,  dial-up,  DSL,  ete.) 
limit  the  ability  of  applieations  to  tap  into  the  plentiful  bandwidth  and  negligible  queuing  avail¬ 
able  in  the  interior  of  the  network.  As  aeeess  teehnology  evolves,  enterprises  and  end-users,  given 
enough  resourees,  ean  inerease  the  eapaeity  of  their  Internet  eonneetions  by  upgrading  their  aeeess 
links.  Indeed,  end-networks  today  often  deploy  aeeess  links  with  speeds  in  exeess  of  100Mbps.  The 
positive  impaet  on  overall  performanee  from  employing  higher  aeeess  speeds  may  be  insignifi  eant, 
however,  if  other  parts  of  the  network  subsequently  beeome  new  performanee  bottleneeks.  Ulti¬ 
mately,  upgrades  at  the  edges  of  the  network  may  simply  shift  existing  bottleneeks  and  hot-spots  to 
the  non-access  or  the  wide-area  portions  of  the  Internet.  To  optimize  the  Internet  aeeess  experienee 
of  end-users  and  end-networks,  therefore,  it  is  important  to  understand  the  loeation  and  traffi  e  load 
of  links  in  the  wide-area  Internet.  In  this  ehapter,  we  present  a  study  of  the  likely  loeation  and 
eharaeteristies  of  bottleneek  links  in  wide-area  Internet. 

As  noted  in  Chapter  2,  the  wide-area  portion  of  the  Internet  is  eomposed  of  several  hundreds 
of  ISPs,  belonging  to  one  of  the  four  tiers  of  the  Internet’s  ISP  hierarehy.  Several  of  these  ISPs 
attaeh  or  “peer”  with  eaeh  other  at  various  geographie  loeations.  A  wide-area  bottleneck,  therefore, 
eould  lie  either  entirely  inside  an  ISP  network  or  at  peering  loeations.  A  more  teehnieal  defi  nition 
of  wide-area  bottleneeks  follows: 

Wide-area  Bottlenecks  The  wide-area  bottleneek  link  on  the  path  between  two  well- 
eonneeted  Internet  endpoints  is  the  link  within  an  ISP  (i.e.,  an  intra-ISP  link)  or  at 
the  boundary  between  neighboring  ISPs  along  the  path  (i.e.,  a  peering  link)  that  eould 
potentially  eonstrain  the  transfer  speed  of  an  unconstrained  TCP  flow  between  the  end¬ 
points. 

An  uneonstrained  TCP  eonneetion  has  the  following  two  eharaeteristies:  (1)  It  does  not  suffer 
from  aeeess  bandwidth  or  buffer  size  limitations  at  the  sender  or  the  reeeiver;  and  (2)  When  run  for 
a  suffi  eiently  long  period  of  time,  the  average  transfer  speed  of  the  flow  equals  the  average  available 
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capacity  along  the  path.  To  clarify,  the  instantaneous  available  capacity  of  a  network  link  is  the  raw 
bandwidth  of  the  link  less  the  current  traffi  c  volume  across  the  link.  The  instantaneous  available 
capacity  of  a  path  is  the  minimum  of  the  available  capacities  of  the  links  in  the  path.  The  average 
available  capacity  is  simply  the  time-average  of  the  instantaneous  available  capacity  on  the  path. 

Our  primary  objective  is  to  devise  a  methodology  for  discovering  and  classifying  wide-area 
bottleneck  links  according  to  their  location  in  the  Internet’s  ISP  hierarchy — that  is,  identify  the  size 
and  expanse  of  the  ISP(s)  that  the  bottleneck  links  belong  in.  A  secondary  objective  is  to  derive 
other  important  properties  of  the  bottleneck  links.  One  such  property  is  the  latency  of  the  bottleneck 
links,  i.e.,  whether  the  typical  bottlenecks  are  “short”  link,  confi  ned  to  a  city,  or  “long”  cross-country 
links.  Finally,  we  also  derive  the  available  capacity  of  the  bottleneck  links. 

The  main  challenge  in  characterizing  Internet  bottlenecks  is  to  measure  paths  that  are  represen¬ 
tative  of  typical  routes  in  the  Internet,  while  avoiding  biases  due  to  a  narrow  view  of  the  network 
from  few  probe  sites.  Further,  we  require  probes  which  themselves  are  well-connected.  Our  results 
are  based  on  measurements  from  26  geographically  diverse  probe  sites  located  primarily  in  the  U.S., 
each  with  very  high  speed  access  to  the  Internet.  We  measure  paths  from  these  sites  to  a  carefully 
chosen  set  of  destinations,  including  paths  to  all  Tier-1  ISPs,  as  well  as  paths  to  a  fraction  of  Tier-2, 
Tier-3,  and  Tier-4  ISPs,  resulting  in  2028  paths  in  total.  In  addition,  we  identify  and  measure  466 
paths  passing  through  public  Internet  exchange  points  in  order  to  explore  the  common  perception 
that  public  exchanges  are  a  major  source  of  congestion  in  the  Internet.  A  second  challenge  lies  in 
actually  measuring  the  bottleneck  link  and  reporting  its  available  bandwidth  and  location.  Due  to 
the  need  for  control  at  both  ends  of  the  path,  we  were  unable  to  leverage  any  of  the  existing  tools 
to  measure  the  available  bandwidth.  Hence,  we  develop  a  tool,  BFind,  which  measures  available 
capacity  using  a  novel  bandwidth  probing  technique. 

We  apply  our  measurement  methodology  to  empirically  determine  the  locations,  estimated  avail¬ 
able  bandwidth,  and  delay  of  non-access  bottleneck  links.  Our  results  show  that  nearly  half  of  the 
paths  we  measured  have  a  non-access  bottleneck  link  with  available  capacity  less  than  50  Mbps. 
Moreover,  the  percentage  of  observed  paths  with  bottlenecks  grows  as  we  consider  paths  to  desti¬ 
nations  in  smaller  ISPs.  Surprisingly,  the  bottlenecks  identifi  ed  are  roughly  equally  split  between 
intra-ISP  links  and  peering  links  between  ISPs.  Also,  we  fi  nd  that  low-latency  links,  both  within 
and  between  ISPs  have  a  signifi  cant  probability  of  constraining  available  bandwidth.  Of  the  paths 
through  public  exchanges  that  had  a  bottleneck  link,  the  constrained  link  appeared  at  the  exchange 
point  itself  in  nearly  half  the  cases. 

Chapter  outline.  Our  measurement  methodology,  along  with  details  on  our  choice  of  paths,  and  the 
design  and  validation  of  our  measurement  tool,  BFind,  is  discussed  in  Section  3.1.  In  Section  3.2, 
we  present  our  measurement-based  characterization  of  wide-area  bottleneck  links.  Finally,  in  Sec¬ 
tion  3.3,  we  discuss  caveats  of  our  approach,  summarize  the  key  observations  in  this  chapter,  and 
analyze  the  implications  on  end-user  performance. 
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Figure  3.1:  ISP  hierarchy:  This  fi  gure  illustrates  the  four  tiers  in  the  Internet  ISP  hierarchy. 

3.1  Measurement  Methodology 

The  Internet  today  is  composed  of  an  interconnected  collection  of  Autonomous  Systems  (ASes). 
These  ASes  can  be  roughly  categorized  as  carrier  ASes  (e.g.  ISPs)  and  stub  ASes  (end-customer 
domains).  Our  goal  is  to  measure  the  characteristics  of  potential  performance  bottlenecks  that  well- 
connected  end-nodes  encounter  that  are  not  within  their  own  control.  To  perform  this  measurement, 
we  need  to  address  three  important  issues: 

1 .  Choosing  an  appropriate  set  of  sources  of  measurement. 

2.  Choosing  an  appropriate  set  of  probe  destinations. 

3.  Developing  a  tool  to  identify  and  characterize  bottlenecks  along  the  probed  paths. 

In  the  fi  rst  two  cases,  we  need  to  pay  careful  attention  to  the  ISPs  that  the  sources  and  desti¬ 
nations  are  connected.  To  clarify  our  choice  of  measurement  probe  sites  with  suffi  cient  diversity 
across  ISPs,  we  briefly  discuss  the  4  tier  classifl  cation  of  ISPs.  Then,  we  present  the  details  of  our 
measurement  probes  and  destinations,  and  our  measurement  too. 

3.1.1  ISP  Hierarchy 

We  use  a  hierarchical  classifl  cation  of  ISPs  or  ASes  into  four  tiers  (as  defi  ned  in  [108]).  The  tier  of 
an  ISP  provides  an  approximate  indication  of  the  size  or  the  global  reach  of  the  ISP.  For  example, 
ISPs  in  tier- 1  of  the  hierarchy,  such  as  AT&T  and  Sprint,  are  large  ISPs  that  do  not  have  any  upstream 
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ISPs.  Most  ISPs  in  tier- 1  have  peering  arrangements  with  eaeh  other.  Lower  in  the  hierarchy, 
tier-2  ISPs,  including  Savvis,  Time  Warner  Telecom  and  several  large  national  ISPs,  have  peering 
agreements  with  a  number  of  ISPs  in  tier-1.  ISPs  in  tier-2  also  have  peering  relationships  with  each 
other,  however,  they  do  not  generally  peer  with  any  other  ISPs.  ISPs  in  tier-3,  such  as  Southwestern 
Bell  and  Turkish  Telecomm,  are  small  regional  ISPs  that  have  a  few  customer  ISPs  and  peer  with 
a  few  other  similar  small  ISPs.  Finally,  the  ISPs  in  tier-4,  for  example  rockynet.com,  have  very 
few  customers  and  typically  no  peering  relationships  at  all  [108].  The  tiers  in  the  ISP  hierarchy  are 
illustrated  in  3.1. 

3.1.2  Choosing  Traffi  c  Sources 

Stub  ASes  in  the  Internet  are  varied  in  size  and  connectivity  to  their  provider  networks.  Large  stubs, 
e.g.  large  universities  and  commercial  organizations  often  have  high  speed  links  to  all  of  their  ISPs. 
Other  stubs,  e.g.  small  businesses,  a  much  slower  connection. 

At  the  core  of  our  measurements  are  traffi  c  flows  between  a  set  of  sources,  which  are  under 
our  control,  and  a  set  destinations  which  are  random,  but  chosen  so  that  we  can  measure  typical 
Internet  paths.  Since  we  are  interested  in  measuring  bottlenecks  faced  by  well-connected  endpoints, 
we  must  ensure  that  the  sources  of  our  measurements  have  high-speed  Internet  connections  as  well. 
Large  commercial  and  academic  organizations  are  example  of  such  endpoints.  In  addition,  we  must 
ensure  that  the  sources  are  geographically  diverse  and  span  ISPs  belonging  to  the  four  tiers.  This 
ensures  that  the  results  are  not  biased  by  repeated  measurement  of  a  small  class  of  bottlenecks  links. 

We  use  hosts  participating  in  the  PlanetLab  project  [88],  which  provides  access  to  a  large  collec¬ 
tion  of  Internet  nodes  that  meet  our  requirements.  PlanetLab  is  a  Internet-wide  testbed  of  multiple 
high-end  machines  located  at  geographically  diverse  locations.  Most  of  the  machines  available  at 
the  time  we  performed  our  experiments  (October  2002)  were  located  in  large  academic  institutions 
and  research  centers  in  the  U.S.  and  Europe.  Note  that  although  our  traffic  sources  are  primarily 
at  universities  and  research  labs,  we  do  not  measure  the  paths  between  these  nodes.  Rather,  our 
measured  paths  are  chosen  to  be  representative  of  typical  Internet  paths  (e.g.,  as  opposed  to  paths 
on  Internet2). 

Initially,  we  chose  one  machine  from  each  of  the  PlanetLab  sites  as  the  initial  candidate  for  our 
experiments.  While  it  is  generally  true  that  the  academic  institutions  and  research  labs  hosting  Plan¬ 
etLab  machines  are  well-connected  to  their  upstream  ISPs,  we  found  that  the  machines  themselves 
are  often  on  low-speed  local  area  networks.  Out  of  the  38  PlanetLab  sites  operational  at  the  outset 
of  our  experiments  (October  2002),  we  identifi  ed  12  that  had  this  drawback.  We  eliminated  these 
12  machines  from  the  set  of  sources  in  our  experiments. 

The  unique  upstream  ISPs  and  locations  of  the  remaining  26  PlanetLab  sites  are  shown  in  Fig¬ 
ure  3.2.  Specifi  cally,  as  Figure  3.2(b)  shows,  the  sources  are  connected  to  a  diverse  set  of  ISPs 
belonging  to  the  4  tiers  of  the  ISP  hierarchy. 
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(a)  Source  of  measurement 


tier-1 

tier-2 

tier-3 

tier-4 

Total  #unique 

ISPs 

11 

11 

15 

5 

Avg.  #ISPs 
per  PlanetLab 

source 

0.92 

0.69 

0.81 

0.10 

(b)  First-hop  connectivity  of  PlanetLab  sources 

Figure  3.2:  Locations  of  PlanetLab  sources  (a)  and  their  connectivity  properties(b):  Three  of  our  sources  and 
seven  destinations  are  located  in  Europe  (shown  in  the  inset).  The  size  of  the  dots  is  proportional  to  the  number  of 
sites  mapped  to  the  same  location. 


3.1.3  Choosing  Probe  Destinations 

We  have  two  objectives  in  choosing  paths  to  measure  from  our  sources.  First,  we  want  to  choose  a 
set  of  network  paths  that  are  representative  of  typical  paths  taken  by  Internet  traffi  c.  Second,  we  wish 
to  explore  the  common  impression  that  public  network  exchanges,  or  NAPs  (network  access  points), 
are  signifi  cant  bottlenecks.  Our  choice  of  network  paths  to  measure  is  equivalent  to  choosing  a  set 
of  destinations  in  the  wide-area  as  targets  for  our  probe  tools.  Below,  we  describe  the  rationale  and 
techniques  for  choosing  test  destinations  to  achieve  these  objectives. 

Most  end-to-end  data  traffi  c  in  the  Internet  flows  between  stub  networks.  One  way  to  measure 
typical  paths  would  have  been  to  select  a  large  number  of  stub  networks  as  destinations.  However, 
the  number  of  such  destinations  needed  to  characterize  properties  of  representative  paths  would 
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make  the  measurements  impraetieal.  Instead,  we  use  key  features  of  the  routing  strueture  of  the 
Internet  to  ehoose  a  smaller  set  of  destinations  for  our  tests. 

Traffi  e  originated  by  a  stub  network  subsequently  traverses  multiple  intermediate  autonomous 
systems  before  reaehing  the  destination  stub  network.  Following  the  defi  nitions  of  AS  hierarehy 
diseussed  earlier,  flows  originated  by  typieal  stub  souree  networks  usually  enter  a  tier-4  or  a  higher 
tier  ISP.  Beyond  this,  the  flow  might  eross  a  sequenee  of  multiple  links  between  ISPs  and  their 
higher-tier  upstream  ISPs  {uphill  path).  At  the  end  of  this  sequenee,  the  flow  might  eross  a  single 
peering  link  between  two  peer  ISPs  after  whieh  it  might  traverse  a  downhill  path  of  ASes  in  pro¬ 
gressively  lower  tiers  to  the  fi  nal  destination,  whieh  is  also  usually  a  stub.  This  form  of  routing, 
arising  out  of  BGP  polieies,  is  referred  to  as  valley-free  routing.  We  refer  to  the  portion  of  the  path 
taken  by  a  flow  that  exeludes  links  within  the  stub  network  at  either  end  of  the  path,  and  the  aeeess 
links  of  either  of  the  stub  networks,  as  the  transit  path. 

Clearly,  non-aeeess  bottleneeks  lie  in  the  transit  path  to  the  destination  stub  network.  Speeifi  - 
eally,  the  bottleneek  for  any  flow  eould  lie  either  (1)  within  any  one  of  the  ISPs  in  the  uphill  or  the 
downhill  portion  of  the  transit  path  or  (2)  between  any  two  distinct  ISPs  in  either  portion  of  the  tran¬ 
sit  path.  Therefore,  we  believe  that  measuring  the  paths  between  our  sources  and  a  wide  variety  of 
different  ISPs  would  provide  a  representative  view  of  the  bottlenecks  that  these  sources  encounter. 

Due  to  the  large  number  of  ISPs,  it  is  impractical  to  measure  the  paths  between  our  sources  and 
all  such  ISP  networks.  However,  the  reachability  provided  by  these  ISPs  arises  directly  from  their 
position  in  the  AS  hierarchy.  Hence,  it  is  more  likely  that  a  path  will  pass  through  one  or  two  tier-1 
ISPs  than  a  lower  tier  ISP.  Therefore,  we  test  paths  between  our  sources  and  all  tier-1  ASes.  Since 
tier-2  ISPs  have  lesser  global  reach,  we  only  test  the  paths  between  our  sources  and  a  fraction  of  the 
tier-2  ISPs  (chosen  randomly).  We  measure  an  even  smaller  fraction  of  all  tier-3  and  tier-4  ISPs. 
The  number  of  ISPs  we  chose  in  each  tier  is  presented  in  Figure  3.3(b). 

In  addition  to  choosing  a  target  AS,  we  need  to  choose  a  target  IP  address  within  the  AS  for  our 
tests.  For  an  ISP  we  choose,  say  <isp>,  we  pick  a  router  that  is  a  few  (2-4)  IP  hops  away  from 
the  server  www.<isp>.com  (or  .net  as  the  case  maybe).  We  confi  rm  this  router  to  be  inside  the  AS 
by  manually  inspecting  the  DNS  name  of  the  router  where  available.  Most  ISPs  name  their  routers 
according  to  their  function  in  the  network,  e.g.  edge  (chi-edge-08.inet.qwest.net)  or  backbone  (sl- 
bbl2-nyc-9-0.sprintlink.net),  routers.  The  function  of  the  router  can  also  be  inferred  from  the  names 
of  routers  adjacent  to  it.  In  addition,  we  double  check  using  the  IP  addresses  of  the  ISP’s  routers 
along  the  path  to  www.<isp>.com  (typically  there  is  a  change  in  the  subnet  address  close  to  the 
web  server).  We  measure  the  path  between  each  of  the  sources  and  the  above  IP  addresses.  The 
geographic  location  of  the  destinations  is  shown  in  Figure  3.3(a).  Each  destination’s  location  is 
identifi  ed  by  that  of  the  traffi  c  source  with  the  least  delay  to  it. 
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(a)  Destinations  probed  (mapped  to  the  closest  source) 


tier-1 

tier-2 

tier-3 

tier-4 

Number 

tested 

20 

18 

25 

15 

Total  in  the 

Internet  [108] 

20 

129 

897 

971 

Percentage 

tested 

100 

14 

3 

1.5 

(b)  Composition  of  the  destination  set 

Figure  3.3:  Location  and  connectivity  the  destinations:  Each  destination  in  (a)  location  is  identlfi  ed  by  the  Planet- 
Lab  source  with  minimum  delay  to  the  destination.  Table  (b)  shows  the  composition  of  the  destination  set. 

3.1.3.1  Public  Exchanges 

ISPs  in  the  Internet  peer  with  each  other  at  a  number  of  locations  throughout  the  world.  These 
peering  arrangements  can  be  roughly  categorized  as  public  exchanges,  or  NAPs,  (e.g.,  the  original 
4  NSF  exchanges)  or  private  peering  (between  a  pair  of  ISPs).  One  of  the  motivations  for  the 
deployment  of  private  peering  has  been  to  avoid  the  perceived  congestion  of  public  exchanges.  As 
part  of  our  measurements,  we  are  interested  in  exploring  the  accuracy  of  this  perception.  Therefore, 
we  need  a  set  of  destinations  to  test  paths  through  these  exchanges. 

We  select  a  set  of  well-known  NAPs,  including  WorldCom  MAE-East,  MAE-West,  MAE- 
Central,  SBC/Ameritech  AADS  and  PAIX  in  Palo  Alto.  Eor  each  NAP,  we  gather  a  list  of  small, 
low-tier  customers  attached  to  the  NAP.  The  customers  are  typically  listed  at  the  Web  sites  of  the 
NAPs.  Since  these  customers  are  low  tier,  there  is  a  reasonable  likelihood  that  a  path  to  these  cus- 
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tomers  from  any  source  passes  through  the  corresponding  NAP  {i.e.,  they  are  not  multihomed  to  the 
NAP  and  another  ISP).  We  then  fi  nd  a  small  set  of  addresses  from  the  address  block  of  each  of  these 
customers  that  are  reachable  via  traceroute.  We  use  the  complete  BGP  table  dump  from  the  Oregon 
route  server  [23]  to  obtain  the  address  space  information  for  these  customers. 

Next,  we  use  a  large  set  of  public  traceroute  servers  (153  traceroute  sources  from  71  ISPs)  [113], 
and  trace  the  paths  from  these  servers  to  the  addresses  identifi  ed  above  using  a  script  to  automate 
fi  nding  and  accessing  working  servers.  For  each  NAP,  we  select  all  paths  which  appear  to  go  through 
the  NAP.  For  this  purpose,  we  use  the  router  DNS  names  as  the  determining  factor.  Specifi  cally, 
we  look  for  the  name  of  the  NAP  to  appear  in  the  DNS  name  of  any  router  in  the  path.  From  the 
selected  paths,  we  pick  out  the  routers  one-hop  away  (both  a  predecessor  and  a  successor)  from  the 
router  identifi  ed  to  be  at  the  NAP  and  collect  their  IP  addresses.  This  gives  us  a  collection  of  IP 
addresses  for  routers  that  could  potentially  be  used  as  destinations  to  measure  paths  passing  through 
NAPs. 

However,  we  still  have  to  ensure  that  the  paths  do  in  fact  traverse  the  NAP.  For  this,  we  run 
traceroutes  from  the  PlanetLab  sources  to  the  predecessor  and  successor  IP  addresses  identifi  ed 
above.  For  each  PlanetLab  source,  we  record  the  subset  of  these  IP  addresses  whose  traceroute 
indicates  a  path  through  the  corresponding  NAP.  The  resulting  collection  of  IP  addresses  is  used  as 
a  destination  set  for  the  particular  PlanetLab  source. 

3.1.4  Bottleneck  Identifi  cation  Tool:  BFind 

We  now  need  a  tool  that  we  can  run  at  the  sources  to  measure  the  bottleneck  link  along  the  se¬ 
lected  paths.  As  discussed  earlier,  the  bottleneck  link  is  the  link  along  the  path  where  the  available 
bandwidth  for  an  unconstrained  TCP  flow  is  the  minimum  ^  We  would  like  the  tool  to  report  the 
available  bandwidth,  latency  and  location  (i.e.  IP  addresses  of  endpoints)  of  the  bottleneck  along  a 
path.  Before  we  describe  our  tool,  we  present  a  brief  overview  of  existing  tools  and  their  drawbacks. 

3.1.4.1  Exiting  Tools 

The  development  of  tools  to  estimate  the  bandwidth  characteristics  of  Internet  paths  continues  to  be 
an  active  research  area  (see  [27]  for  a  more  complete  list).  Tools  like  bprobe  [30],  Nettimer  [63], 
and  PBM  [87]  use  packet-pair  like  mechanisms  to  measure  the  raw  bottleneck  capacity  along  a  path. 
Other  tools  like  clink  [32],  pathchar  [55],  pchar  [67],  and  pipechar  [49],  characterize  hop-by-hop 
delay,  raw  capacity,  and  loss  properties  of  Internet  paths  by  observing  the  transmission  behavior  of 
different  sized  packets.  A  different  set  of  tools,  e.g.,  pathload  [56],  focus  on  the  available  capacity 
on  a  path.  However,  these  tools  require  control  over  both  the  endpoints  of  the  measurement.  Since 
our  destinations  are  routers  not  in  our  direct  control,  these  tools  are  not  applicable. 

'Notice  that  a  particular  link  being  a  bottleneck  does  not  necessarily  imply  that  the  link  is  heavily  utilized  or  congested. 
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TReno  [70]  usies  UDP  packets  to  measure  available  bulk  transfer  capacity.  It  operated  in  a 
single-ended  mode  and  sends  hop-limited  UDP  packets  toward  the  destination.  TReno  emulates 
TCP  congestion  control  by  using  sequence  numbers  contained  in  the  ICMP  error  responses.  It 
probes  each  hop  along  a  path  in  turn  for  available  capacity.  Therefore,  when  used  to  identify  bottle¬ 
necks  along  a  path,  TReno  will  likely  consume  ICMP  processing  resources  for  every  probe  packet 
at  each  router  being  probed  as  it  progresses  hop-by-hop.  As  a  result,  for  high-speed  links,  TReno  is 
highly  intrusive.  In  what  follows,  we  describe  the  design  and  operation  of  our  bottleneck  identifi  ca¬ 
tion  tool — BFind — that  addresses  the  drawbacks  of  the  above  tools. 

3.1.4.2  BFind  Design 

BFind’s  design  is  motivated  by  TCP’s  property  of  gradually  fi  lling  up  the  available  capacity  based 
on  feedback  from  the  network.  First,  BFind  obtains  the  propagation  delay  of  each  hop  to  the  desti¬ 
nation.  Bfi  nd  uses  the  minimum  of  the  (non-negative)  measured  delays  along  a  hop  as  an  estimate 
for  the  propagation  delay  on  the  hop^.  The  minimum  is  taken  over  delay  samples  from  5  traceroutes. 

After  this  step,  BFind  starts  a  process  that  sends  UDP  traffi  c  at  a  low  sending  rate  (2  Mbps)  to 
the  destination.  A  trace  process  also  starts  running  concurrently  with  the  UDP  process.  The  trace 
process  repeatedly  runs  traceroutes  to  the  destination.  The  hop-by-hop  delays  obtained  by  each  of 
these  traceroutes  are  combined  with  the  raw  propagation  delay  information  (computed  initially)  to 
obtain  rough  estimates  of  the  queue  lengths  on  the  path.  The  trace  process  infers  that  the  queue 
on  a  particular  hop  is  potentially  increasing  if,  across  3  consecutive  measurements,  the  queuing 
delay  on  the  hop  is  greater  than  the  maximum  of  5ms  and  20%  of  the  raw  propagation  delay  on  the 
hop.  This  information,  computed  for  each  hop  by  the  trace  process,  is  constantly  accessible  to  the 
UDP  process.  The  UDP  process  then  uses  this  information  to  constantly  adjust  its  sending  rate  as 
described  below. 

If  the  feedback  from  the  trace  process  indicates  no  increase  in  the  queues  along  any  hop,  the 
UDP  process  increases  its  rate  by  200  Kbps  (the  rate  change  occurs  once  per  feedback  event,  i.e., 
per  traceroute).  Essentially,  BFind  emulates  the  increase  behavior  of  TCP,  albeit  more  aggressively, 
while  probing  for  available  bandwidth.  If,  on  the  other  hand,  the  trace  process  reports  an  increased 
delay  on  any  hop(s),  BFind  flags  the  hop  as  being  a  potential  bottleneck  and  the  traceroutes  continue 
monitoring  the  queues.  In  addition,  the  UDP  process  keeps  the  sending  rate  steady  at  the  current 
value  until  one  of  the  following  events  occur:  (1)  The  hop  continues  to  be  flagged  by  BFind  over 
consecutive  measurements  by  the  trace  process  and  a  threshold  number  (15)  of  such  observations 
are  made  for  the  hop.  (2)  The  hop  has  been  flagged  a  threshold  number  of  times  in  total  (50).  (3) 
BFind  has  run  for  a  pre-defi  ned  maximum  amount  of  total  time  (180  seconds).  (4)  The  trace  process 
reports  that  there  is  no  queue  build-up  on  any  hop,  implying  that  the  increasing  queues  were  only  a 

^If  the  difference  in  the  delay  to  two  consecutive  routers  along  a  path  is  negative,  then  the  delay  for  the  corresponding 
hop  is  assumed  to  be  zero 
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transient  occurrence. 

In  the  fi  rst  two  cases,  BFind  quits  and  identifi  es  the  hop  responsible  for  the  tool  quitting  as 
being  the  bottleneck.  In  the  third  case,  BFind  quits  without  providing  any  reliable  conclusion  about 
bottlenecks  along  the  path.  In  the  fourth  case,  BFind  continues  to  increase  its  sending  rate  at  a 
steady  pace  in  search  of  the  bottleneck. 

If  the  trace  process  observes  that  the  queues  on  the  first  1-3  hops  from  the  source  are  building, 
it  quits  immediately,  to  avoid  flooding  the  local  network  (The  fi  rst  3  hops  usually  includes  links  in 
the  source  stub  network).  Also,  we  limit  the  maximum  send  rate  of  BFind  to  50Mbps  to  make  sure 
that  BFind  does  not  occupy  too  much  of  the  local  area  network  capacity.  Hence,  we  only  identify 
bottlenecks  with  <  50Mbps  of  available  capacity.  If  BFind  quits  due  to  these  exceptional  conditions, 
it  does  not  report  any  bottlenecks.  Notice  that  BFind  not  only  identifi  es  the  bottleneck  link  in  a  path, 
but  also  estimates  the  available  capacity  at  the  bottleneck.  This  equals  the  send  rate  just  before  the 
tool  quit  (upon  identifying  the  bottleneck  reliably).  For  paths  on  which  no  bottlenecks  have  been 
identifi  ed,  BFind  outputs  a  lower  bound  on  the  available  capacity. 

Notice  that  in  several  respects,  the  operation  of  BFind  is  similar  to  TCP  Vegas’s  [25]  rate-based 
congestion  control.  However,  our  sending  rate  modifi  cation  is  different  than  Vegas  for  two  reasons. 
First,  we  want  to  ensure  that  the  bottleneck  link  experiences  a  reasonable  amount  of  queuing  in  order 
to  come  to  a  defi  nitive  conclusion.  Therefore,  BFind  needs  to  be  more  aggressive  than  Vegas.  Sec¬ 
ond,  the  feedback  loop  of  the  trace  process  is  much  slower  than  Vegas.  One  obvious  drawback  with 
this  design  is  that  BFind  is  a  relatively  heavy-weight  tool  that  sends  a  large  amount  of  data.  This 
makes  it  diffi  cult  to  fi  nd  a  large  number  of  sites  willing  to  host  such  experiments.  BFind  is  not  suit¬ 
able  for  continuous  monitoring  of  available  bandwidth,  but  rather  for  short  duration  measurements. 
However,  modifi  cations  to  the  design  of  BFind  (see  [50])  have  addressed  this  issue. 

3.1.4.3  BFind  Operation:  An  Example 

Figure  3.4  illustrates  how  BFind  works.  In  Figure  3.4(a),  BFind  is  run  between  planetl.scs.cs.nyu.edu 
(NYU)  and  rl-srp5-0.cst.hcvlny.cv.net  (Cable  Vision  Corp,  AS6128,  tier-3).  As  BFind  ramps  up 
its  transmission  rate,  the  delay  of  hop  6  link  between  at-bb4-nyc-0-0-0-OC3.appliedtheory.net  and 
jfk3-core5-s3-7.atlas.algx.net)  begins  to  increase.  BFind  freezes  its  sending  rate  as  the  delay  on  this 
hop  increases  persistently.  Finally,  BFind  identifi  es  this  hop  as  bottleneck  with  about  26Mbps  of 
available  capacity.  This  link  also  had  a  raw  latency  of  under  0.5ms.  The  maximum  queuing  delay 
observed  on  this  bottleneck  link  was  about  140ms. 

Figure  3.4(b)  shows  a  potential  false-positive.  Run  between  planetlabl.lcs.mit.edu  (MIT)  and 
Amsterdaml.ripe.net  (RIPE,  tier-2),  BFind  observes  the  delays  on  various  hops  along  the  path  in¬ 
creasing  on  a  short  time-scale  causing  BFind  to  freeze  its  UDP  send  rate  quite  often.  The  delay 
on  hop  15  increases  reasonably  steadily  starting  at  around  80  secs.  This  steady  increase  causes 
BFind  to  conclude  that  hop  15  was  the  bottleneck.  However,  it  is  possible  that,  similar  to  the  other 
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(a)  (b) 


Figure  3.4:  The  operation  of  BFind:  In  either  graph,  queueing  delay  is  shown  on  the  left  y-axis.  The  instantaneous 
UDP  rate  is  shown  on  the  right  y-axis.  In  (a),  BFind  identifi  es  hop  6  as  the  hottleneck.  In  (b),  BFind  identifi  es  hop 
15  as  the  bottleneck,  although  this  could  potentially  be  a  false  positive. 


hops,  this  congestion  was  transient  too,  as  indicated  by  a  dip  in  the  delay  on  hop  15  after  lOOsecs. 
Therefore,  we  cannot  entirely  rule  out  the  possibility  of  false-positives  in  our  analysis.  But  we  do 
believe  that  our  choices  of  the  set  thresholds  for  BFind  would  keep  the  overall  number  of  false 
positives  reasonably  low.  We  derive  these  threshold  by  testing  various  combinations  that  yielding 
minimum  estimation  error.  Notice  that  false  negatives  might  occur  in  BFind  only  when  the  path 
being  explored  was  very  free  of  congestion  during  the  run,  while  being  persistently  overloaded  at 
other  times.  Given  that  BFind  runs  for  at  least  30secs,  and  sometimes  up  to  150secs,  we  think  that 
false  negatives  are  unlikely. 

3.1.4.4  BFind  Validation 

In  this  section,  we  first  present  simulation  experiments  in  ns-2  [81]  to  address  the  following  issues 
about  our  bottleneck  estimation  tool,  BFind,  presented  in  Section  3. 1.4.2: 

1.  How  accurately,  and  quickly,  can  BFind  estimate  the  location  of  bottlenecks?  Does  the  ca¬ 
pacity  of  the  bottleneck  links  or  their  location  on  the  path  impact  the  speed  or  accuracy?  How 
does  the  presence  of  multiple  bottlenecks  affect  the  detection? 

2.  How  does  the  bandwidth  probing  behavior  of  BFind  compare  with  that  of  a  TCP  flow  across 
the  bottleneck  link?  Is  BFind  more  or  less  aggressive  than  TCP? 

3.  How  does  BFind  compete  with  long-lived  TCP  cross  traffi  c  while  probing  for  available  band¬ 
width  at  a  bottleneck  link  (given  that  the  bottleneck  faced  by  the  competing  TCP  flows  is 
different  from  that  faced  by  BFind)? 

To  address  the  above  issues,  we  port  BFind  to  ns-2.  We  setup  path  topologies  shown  in  Fig¬ 
ure  3.5(a)  and  (b).  We  chose  not  to  experiment  with  more  complicated  topologies  since  BFind 
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Figure  3.5:  Topology  used  for  BFind  NS  simulations:  The  topologies  are  explained  in  detail  below.  “M”  stands 
for  Mbps.  The  fl  rst  row  corresponds  to  location  of  the  bottleneck  link  being  “close”,  the  second  corresponds  to 
“middle”  and  the  third  to  “far”. 


probes  only  along  a  single  path.  As  a  result,  all  other  nodes  and  links  in  the  topology  become  aux¬ 
iliary.  In  either  fi  gure,  the  path  contains  10  hops  (the  hop-by-hop  delays  used  we  were  observed  on 
traceroutes  from  a  machine  in  CMU  and  www .  amazon .  com).  The  capacity  of  the  bottleneck  links 
in  the  path  is  shown  in  italics. 

In  Figure  3.5(a),  there  is  exactly  one  bottleneck  in  the  path.  To  test  the  probing  behavior  of 
BFind  we  vary  the  location  of  this  bottleneck  link  along  the  path  (between  “close”,  “middle”  and 
“far”,  as  explained  below),  as  well  as  its  raw  capacity  (between  22Mbps  -  referred  to  as  “Setting 
1”  -  and,  45Mbps  -  referred  to  as  “Setting  2”  shown  in  italicized  bold  font).  In  Figure  3.5(a),  when 
the  location  of  the  bottleneck  link  is  “close”,  hop  2  is  the  bottleneck  link;  when  the  location  is 
“middle”,  hop  5  is  the  bottleneck;  and  when  the  location  is  “far”,  hop  9  is  the  bottleneck.  Therefore, 
Figure  3.5(a)  pictorially  summarizes  6  different  experiments  with  a  single  bottleneck  link  on  the 
path  -  3  different  bottleneck  “Locations”,  each  with  2  different  bandwidth  “Settings”. 

The  topology  in  Figure  3.5(b)  has  two  similar  bottleneck  links.  In  “Setting  1”,  both  links  have 
an  identical  capacity  of  22Mbps;  in  “Setting  2”,  they  have  an  identical  capacity  of  45Mbps.  When 
the  “Location”  of  the  bottlenecks  is  “close”,  hops  2  and  3  are  chosen  to  be  the  identical  bottleneck 
links;  when  it  is  “middle”,  hops  2  and  5  are  the  bottlenecks;  and  when  it  is  “far”,  hops  2  and  9  are 
the  bottlenecks. 

In  either  topology,  unless  otherwise  specifi  ed,  there  is  cross  traffi  c  between  neighboring  routers. 
The  cross  traffi  c  consists  of  25  HTTP  sessions  in  NS-2,  each  confi  gured  with  25  maximum  connec¬ 
tions.  In  addition,  the  cross  traffi  c  also  includes  25  consfanf  rale  UDP  flows  wifh  defaull  parameters 
as  sef  in  NS-2.  Cross  Iraffi  c  on  fhe  reverse  pafh  belween  neighboring  roufers  is  also  similar.  On 
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Location 

Capacity  of  bottleneck  link 

Capacity  =  22Mbps  (“Setting  1”) 

Capacity  =  45Mbps  (“Setting  2”) 

BFind 

Oulpul 

Time 
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BW  (BFind) 
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BFind 

Oulpul 

Time 

Available 

BW  (BFind) 

TCP 

Throughpul 

Close 

2 

17.1 

5.8 

4.6 

2 

26.1 

8.2 

20.53 

Middle 

5 

20.6 

6.2 

5.12 

5 

51.1 

15.8 

23.1 

Far 

9 

19.6 

6.2 

4.5 

9 

57.1 

20.2 

24.09 

Table  3.1:  The  bandwidth-probing  performance  of  BFind:  The  table  shows,  for  each  of  the  six  confl  gurations  of  the 
topology  in  Figure  3.5(a),  the  output  obtained  from  BFind  and  its  comparison  with  a  TCP  fbw  on  the  bottleneck 
hop. 


Location 

Capacity  of  bottleneck  link 

Capacity  =  22Mbps  (“Setting  1”) 

Capacity  =  45Mbps  (“Setting  2”) 

BFind  Oulpul 

Time 

BFind  Oulpul 

Time 

Close  {2nd  and  3rd  hops) 

2 

17.1 

2 

17.1 

Middle  (2nd  and  bth  hops) 

5 

22.1 

2 

29.6 

Far  (2nd  and  2th  hops) 

9 

17.1 

2 

38.6 

Table  3.2:  Performance  of  BFind  in  the  presence  of  two  similar  bottlenecks:  The  table  shows  the  hops  identifi  ed 
by  BFind  as  being  the  bottleneck  in  each  of  the  six  confl  gurations  in  Figure  3.5(b),  and  the  time  taken  to  reach  the 
conclusion. 

average,  the  eross  traffi  e  on  all  hops  is  similar. 

In  Table  3.1,  we  show  the  performanee  of  BFind  for  the  six  experiments  in  Figure  3.5(a).  We 
show  if  BFind  eorreetly  identifi  es  the  appropriate  bottleneek,  the  time  taken  until  deteetion,  and  the 
available  bandwidth  reported  by  BFind.  In  addition,  we  also  report  the  average  throughput  of  a  TCP 
eonneetion  aeross  the  bottleneek  link.  The  TCP  eonneetion  runs  under  the  exact  same  eonditions  of 
eross  traffi  e  as  BFind.  The  resulfs  show  fhaf:  (1)  BFind  aeeurafely  defermines  boffleneek  links  for 
bofh  eapaeify  values.  When  fhe  eapaeify  of  fhe  bottleneek  link  is  higher,  fhe  lime  faken  by  BFind  is 
nol  neeessarily  worse.  (2)  The  Ihroughpuf  probed  by  BFind  is  roughly  similar  fo  fhaf  aehieved  by 
fhe  TCP  eonneefion.  When  fhe  eapaeify  of  fhe  boffleneek  link  is  low,  BFind  probes  somewhaf  more 
aggressively  fhan  TCP;  however,  when  fhe  eapaeify  is  higher,  BFind  does  nol  probe  as  aggressively. 

In  Table  3.2,  we  show  fhe  resulfs  for  fhe  performanee  of  BFind  in  fhe  presenee  of  Iwo,  very 
similar,  bollleneeks  along  a  palh  (fhe  lopology  in  Figure  3.5(b)).  The  resulfs  show  fhaf  BFind 
idenlifi  es  one  of  fhe  Iwo  links  as  being  a  bottleneek.  However,  fhe  oulpuf  is  non-delerminislie. 
To  furlher  invesligale  BFind’s  ability  lo  deleel  bollleneeks  where  mulliple  sueh  links  may  exisl 
along  a  palh,  we  slighlly  modifi  ed  fhe  lopology  in  Figure  3.5(b)  as  follows:  instead  of  Iwo  idenlieal 
bollleneeks  along  fhe  palh,  we  deliberately  sel  one  of  Ihem  lo  a  slightly  higher  eapaeify.  In  “Setting 
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Capacity  of  bollleneck  link 
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Far  {2nd  and  9th  hops) 

2 

17.1 
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27.6 

Table  3.3:  Performance  of  BFind  in  the  presence  of  two  slightly  different  bottlenecks:  The  table  shows  the  hops 
identifi  ed  by  BFind  as  being  the  bottleneck  in  each  of  the  six  confi  gurations  in  Figure  3.5(b)  when  the  bandwidth 
of  one  of  the  hops  on  the  path  is  chosen  to  be  slightly  higher  than  that  of  the  other. 


Number  of  Flows 


Figure  3.6:  BFind  interaction  with  competing  long-lived  TCP  fbws:  The  fi  gures  plots  the  available  bandwidth 
reported  by  BFind  for  the  topology  in  Settings  1  and  2,  when  competing  long-lived  TCP  fbws  on  the  bottleneck 
hops  are  constrained  to  at  most  lOMhps. 

1”,  the  second  bottleneck  link  (i.e.,  hops  3,  5  and  9,  respectively,  in  the  three  cases)  now  had  a 
slightly  higher  capacity  of  25Mbps.  In  “Setting  2”,  the  capacity  of  the  second  link  was  chosen  to 
be  50Mbps.  The  results  for  these  experiments  are  shown  in  Table  3.3.  In  almost  all  cases,  BFind 
correctly  identifi  es  hop  2  as  the  bottleneck  link,  despite  the  almost  similar  capacity  of  the  second 
constrained  link  along  the  hop.  Also,  the  time  taken  for  detection  is  not  necessarily  worse. 

We  also  show  results  demonstrating  the  interaction  of  BFind  with  competing  long-lived  TCP 
traffi  c.  For  these  simulations,  we  use  the  single  bottleneck  topology  in  Figure  3.5(a),  with  a  location 
of  “mid”  (i.e.,  bottleneck  is  hop  5).  We  eliminated  cross  traffi  c  along  hop  5  and,  insfead,  sfarfed  N 
long-lived  TCP  flows  befween  rouler-4  and  rouler-5  such  fhaf  fhe  fofal  bandwidfh  achieved  by  fhe 
TCP  flows  is  <  lOMbps  af  any  poinf  of  lime.  We  Ihen  sfarfed  BFind  on  roufer  0  lo  probe  for  fhe 
available  capacity  on  fhe  boffleneck  link,  hop  5.  Notice  fhaf  in  “Selling  1”,  BFind  should  reporl  an 
available  capacity  of  al  mosl  12Mbps  (since  fhe  raw  capacity  is  22Mbps),  while  in  “Selling  2”,  if 
should  reporl  al  mosl  35Mbps.  In  Figure  3.6,  we  plol  Ihe  available  bandwidlh  reported  by  BFind  in 
eilher  selling,  as  a  function  of  Ihe  number  of  TCP  flows,  N.  In  “Selling  2”,  Ihe  bandwidlh  reported 
by  BFind  is  always  lower  lhan  35Mbps,  indicating  lhal  BFind  does  nol  have  undesirable  interactions 
wilh  competing  TCP  Iraffi  c.  In  “Selling  1”,  Ihe  bandwidlh  from  BFind  is  almosl  exaclly  12Mbps 
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Desfinafion  Node 

Palh  lengfh 

Palhload  Reporl 

Pipechar  Reporl 

BFind  Reporl 

CMU-PL 

14 

58.1  -  107.2Mbps 

82.4Mbps 

>39. 1Mbps 

Princefon-PL 

12 

91.3  -  96.8Mbps 

94.5Mbps 

>20.5Mbps 

KU-PL 

15 

8.23  -  8.87Mbps 

5.21Mbps  (hop  12) 

9.88Mbps  (hop  12) 

Piffsburgh-node 

14 

4.17 -5.21Mbps 

4.32Mbps  (hop  11) 

8.34Mbps  (hop  11) 

www.fnsi.nel 

11 

N/A 

8.2Mbps  (hop  10) 

8.43Mbps  (hop  10) 

www.il.nel 

11 

N/A 

19.21Mbps  (hop  7) 

32.91Mbps  (hop  8) 

Table  3.4:  BFind  validation  results:  Statistics  for  the  comparison  between  BFind,  Pathload  and  Pipechar 

as  long  as  iV  >  5,  again,  show  that  BFind  competes  fairly  with  long-lived  TCP  traffi  c.  (However, 
BFind  is  unfair  in  the  RTT-proportional  fairness  sense). 

Wide-Area  Tests.  Next,  we  present  the  results  from  a  limited  set  of  wide-area  experiments  to 
evaluate  the  available  bandwidth  estimate  and  the  bottleneck  location  accuracy  of  BFind.  To  validate 
the  available  bandwidth  estimate  produced  by  BFind,  we  compare  it  against  Pathload  [56],  a  widely- 
used  available  bandwidth  measurement  tool.  Pathload  estimates  the  range  of  available  bandwidth  on 
the  path  between  two  given  nodes.  Since  measurements  are  taken  at  either  end  of  the  path,  control 
is  necessary  at  both  end-hosts. 

To  validate  the  bottleneck  location  estimation  of  BFind,  we  compare  it  with  Pipechar  [49],  which 
operates  similarly  to  tools  like  pathchar  [55]  and  pchar  [67].  Pipechar  outputs  the  path  character¬ 
istics  from  a  given  node  to  any  arbitrary  node  in  the  Internet.  For  each  hop  on  the  path,  Pipechar 
computes  the  raw  capacity  of  the  link,  as  well  as  an  estimate  of  the  available  bandwidth  and  link 
utilization.  We  consider  the  hop  identifi  ed  as  having  the  least  available  bandwidth  to  be  the  bottle¬ 
neck  link  output  by  Pipechar  and  compare  it  with  the  link  identifi  ed  by  BFind.  We  also  compare  the 
available  bandwidth  estimates  output  by  BFind  and  Pipechar. 

For  these  experiments,  we  perform  transfers  from  a  machine  located  at  a  commercial  data  center 
in  Chicago,  IL  to  a  large  collection  of  destinations.  Some  of  these  destinations  are  nodes  in  the 
PlanetLab  infrastructure  and  hence  we  have  control  over  both  ends  of  the  path  when  probing  these 
destinations.  The  other  destinations  are  random  routers  inside  a  few  ISPs.  When  probing  the  path 
to  the  latter  destinations,  we  do  not  have  control  over  the  destination  end  of  the  path.  In  total,  we 
probe  30  destinations. 

A  sample  of  the  results  from  our  tests  are  presented  in  Table  3.4.  The  fi  rst  three  machines  belong 
to  the  PlanetLab  infrastructure.  The  fourth  machine  is  located  in  Pittsburgh  and  attached  via  AT&T. 
The  source  is  a  host  located  in  a  Chicago  area  data  center.  The  corresponding  hop  number  for 
the  bottleneck  (if  found)  is  also  shown  in  parentheses.  Note  that  since  BFind  limits  its  maximum 
sending  rate  it  cannot  identify  bottlenecks  with  high  available  capacity  (see  the  fi  rsf  example  in 
Table  3.4).  In  fhe  second  case,  fhe  ISOsecs  maximum  execufion  fime  was  insufficienf  for  BFind  fo 
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probe  beyond  20Mbps^.  The  rest  of  the  results  show  that  BFind  is  reasonably  consistently  outputs 
the  same  location  and  available  bandwidth  as  Pathload  and  Pipechar. 

We  also  performed  an  initial  cross-validation  of  our  approach  by  checking  if  PlanetLab  sources 
in  a  given  metro  area,  attached  to  the  same  upstream  ISP,  identify  the  same  bottleneck  links  when 
probing  destination  IP  addresses  selected  in  Section  3.1.3.  For  example,  in  the  Los  Angeles  metro 
area,  we  found  that  the  sources  at  UCSD,  UCLA,  UCSB,  and  ISI  all  identify  similar  bottlenecks 
in  paths  to  the  destinations  in  all  cases  where:  (1)  the  bottlenecks  are  not  located  in  their  access 
network  (CalREN2)  and  (2)  the  paths  are  identical  beyond  the  access  network. 

3.1.5  Metrics  Reported 

In  addition  to  the  available  bandwidth  and  the  latency  measurements,  we  post-process  BFind’s  out¬ 
put  to  report  on  the  ownership  and  location  of  Internet  bottlenecks.  We  fi  rst  classify  bottlenecks 
based  on  ownership:  bottlenecks  either  lie  within  ISPs,  which  we  further  classify  by  the  tier  of  the 
owning  ISP,  or  between  ISPs,  which  we  further  classify  according  to  the  tiers  of  the  ISPs  at  each  end 
of  the  bottleneck.  We  identify  the  ISP  owning  the  endpoint  of  any  particular  link  using  the  whois 
servers  from  RADB  [91]  and  RIPE  [95]  routing  registries.  Our  second  classifi  cation  is  based  on  the 
latency  of  the  bottleneck  links.  We  classify  bottlenecks  according  to  three  different  levels  of  latency 
-  low  latency  (<  Sms),  medium  (between  5  and  15ms)  and  high  (>  15ms).  Though  this  is  clearly  a 
rough  classifi  cation,  we  chose  these  classes  to  correspond  to  links  at  a  PoP,  links  connecting  smaller 
cities  to  larger  PoPs,  and  long-haul  links.  Eor  paths  to  the  NAPs,  we  classify  the  path  into  three 
categories  -  those  that  do  not  have  a  bottleneck  (as  reported  by  BEind),  those  that  have  a  bottleneck 
at  the  NAP,  and  those  that  have  a  bottleneck  elsewhere.  Again,  we  are  only  interested  in  non-access 
bottlenecks.  Einally,  we  present  a  cumulative  distribution  of  the  available  capacity  of  bottlenecks 
within  a  category. 

3.2  Results 

Over  a  period  of  5  weekdays,  we  ran  BEind  between  the  source  and  destination  sites.  The  exper¬ 
iments  were  conducted  between  9am  and  5pm  EST  on  weekdays.  These  tests  identifi  ed  a  large 
number  (889)  of  non-access  bottleneck  links  along  many  (2028)  paths. 

3.2.1  Path  Properties 

Before  describing  the  properties  bottleneck  links,  we  present  some  important  overall  characteristics 
of  the  measured  paths.  Eigures  3.7(b)  and  3.8(b)  summarize  overall  features  of  paths  from  PlanetEab 

^In  >  97%  of  the  paths  we  prohed,  BFind  completed  well  before  180s,  either  because  a  bottleneck  was  found  or 
because  the  limit  on  the  send  rate  was  reached. 
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(a)  intra-ISP  bottlenecks 


(b)  All  intra-ISP  links 


(c)  Relative  proportions:  bottleneck  links  vs.  all  links 


Figure  3.7:  Relative  prevalence  of  intra-ISP  bottlenecks:  Graph  (a)  shows  the  average  number  of  bottlenecks 
of  each  kind  appearing  inside  ISPs,  classifl  ed  by  path  type.  The  graph  in  (b)  shows  the  total  number  of  links 
(bottleneck  or  not)  of  each  kind  appearing  in  all  the  paths  we  considered.  In  (a)  and  (b),  the  left  bar  shows  the 
overall  average  number  of  links,  while  the  right  shows  the  average  number  of  unique  links.  Graph  (c)  shows  the 
relative  fraction  of  intra-ISP  bottlenecks  links  of  various  types  (left  bar)  and  the  average  path  composition  of  all 
links  (right  bar). 


sites,  classifl  ed  by  paths  to  ISPs  of  a  particular  tier.  On  the  y-axis,  we  plot  the  normalized  number 
of  links,  i.e.,  the  total  number  of  links  encountered  of  each  type  divided  by  the  total  number  of  paths 
in  each  class.  Each  path  class  has  a  pair  of  bars.  While  the  left  bars  in  the  graphs  show  the  overall 
average  properties  of  the  paths,  the  right  bars  show  the  average  number  of  unique  links  that  each 
path  class  adds  to  our  measurements.  This  number  is  signifi  cantly  less  than  the  actual  link  counts 
(by  a  factor  of  2  or  3).  This  is  because  links  near  the  sources  and  destinations  are  probed  by  many 
paths  (and  are  double-counted).  Such  links  can  bias  our  measurements  since  they  may  appear  as 
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(a)  Bottlenecks  at  peering  links 


(b)  All  peering  links 


(c)  Relative  proportions:  bottlenecks  vs.  all  links 

Figure  3.8:  Relative  prevalence  of  peering  bottlenecks:  Graph  (a)  shows  the  average  number  of  bottlenecks  of  each 
kind  appearing  between  ISPs,  classlli  ed  by  path  type.  The  graph  in  (b)  shows  the  total  number  of  links  (bottleneck 
or  not)  of  each  kind  appearing  in  all  the  paths  we  considered.  In  (a)  and  (b),  the  left  bar  shows  the  overall  average 
number  of  links,  while  the  right  shows  the  average  number  of  unique  links.  Graph  (c)  shows  relative  fraction  of 
peering  bottlenecks  of  various  types  (left  bar)  and  the  average  path  composition  for  all  links  (right  bar). 


bottlenecks  for  many  paths.  Therefore,  we  also  present  information  about  unique  links  instead  of 
describing  average  path  properties  alone. 

Note  that  Figure  3.7  shows  intra-ISP  links  while  Figure  3.8  shows  peering  links.  Characteristics 
of  the  entire  paths  are  evident  by  examining  the  two  together.  For  example,  Figure  3.7(b)  shows 
that  the  average  path  between  a  PlanetLab  site  and  one  of  the  tier-2  destinations  traversed  about  4.5 
links  inside  tier-1  ISPs,  2.0  tier-2  ISP  links,  and  0.5  tier-3  links.  Figure  3.8(b),  which  illustrates  the 
location  of  the  peering  links,  shows  that  these  same  paths  also  traversed  about  0.25  tier-1  to  tier-1 
peering  links,  0.75  tier-1  to  tier-2  links,  0.2  tier-1  to  tier-3  links,  0.2  tier-2  to  tier-2  links,  and  a  small 
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number  of  other  peering  links.  The  total  average  path  length  of  paths  to  tier-2  ISPs,  then,  is  the  sum 
of  these  two  bars,  i.e.  7  +  1.4  =  8.4  hops.  Similar  bars  for  tier-1,  tier-3  and  tier-4  destinations 
show  the  breakdown  for  those  paths.  One  elear  trend  is  that  the  total  path  length  for  lower  tier 
destinations  is  longer.  The  tier-1  average  length  is  7.8  hops,  tier-2  is  8.3,  tier-3  is  8.3  and  tier-4  is 
8.8.  Another  important  feature  is  the  number  of  different  link  types  that  make  up  typieal  paths  in 
eaeh  elass.  As  expeeted  from  the  defi  nition  of  the  tiers,  we  see  a  mueh  greater  diversity  {i.e.,  hops 
from  different  tiers)  in  the  paths  to  lower  tier  destinations.  For  example,  paths  to  tier-4  destinations 
eontain  a  signifi  eant  proportion  of  all  types  of  peering  and  intra-ISP  links. 

To  summarize,  we  fi  nd  that  links  involving  tier-1  ISPs  (intra-ISP  links  or  peering  links)  form  a 
signifi  eant  fraetions  of  the  typieal  paths  we  measured.  We  also  fi  nd  fhaf  fypieal  pafhs  have  far  fewer 
peering  links  fhan  infra-ISP  links. 

3.2.2  Locations  of  Bottlenecks 

Figures  3.7(a)  and  3.8(a)  deseribe  fhe  differenl  fypes  of  boffleneek  links  found  on  pafhs  fo  destina¬ 
tions  belonging  fo  fhe  4  tiers.  The  lefl  bars  in  fhe  graphs  show  fhe  probabilify  fhaf  fhe  idenlifi  ed 
boffleneek  link  is  of  a  parfieular  fype,  based  on  our  observafions.  For  example,  from  Figure  3.7(a), 
we  see  fhaf  fhe  boffleneek  links  on  pafhs  fo  fier-2  nefworks  eonsisf  of  links  inside  fier-l  ISPs  7%  of 
fhe  lime,  lier-2  links  11%  of  fhe  time,  and  fier  3  links  3%  of  fhe  time  (boffleneeks  wifhin  lier-4  ISPs 
appear  only  in  0.2%  of  fhe  eases).  From  Figure  3.8(a),  we  see  fhaf  peering  links  of  differenl  fypes 
aeeounl  for  boffleneeks  in  tier-2  pafhs  nearly  15%  of  fhe  lime,  wilh  lier-l  fo  lier-2  links  appearing 
as  fhe  mosl  likely  among  all  fypes  of  peering  boffleneek  links.  These  Iwo  graphs  logelher  indieafe 
fhaf  approximafely  36%  of  tier-2  pafhs  we  measured  had  a  boffleneek  fhaf  we  were  able  fo  identify. 
The  olher  64%  appear  fo  have  boffleneeks  wilh  an  available  eapaeily  greafer  fhan  50Mbps. 

Figures  3.7(e)  and  3.8(e)  show  fhe  breakdown  of  links  averaged  aeross  eaeh  type  of  palh,  for 
infra-ISP  and  peering  links,  respeelively.  Comparing  fhe  heighls  of  eomponenls  in  fhe  lefl  and  righl 
bars  gives  an  indiealion  of  fhe  prevalenee  of  fhe  eorresponding  fype  of  boffleneek  link  (lefl  bar), 
relafive  fo  ils  overall  appearanee  in  fhe  pafhs  (righf  bar).  From  Figure  3.7(e),  if  fi  rsl  appears  fhaf 
lower-lier  infra-ISP  links  are  palh  boffleneeks  in  mueh  greater  proportion  fhan  Iheir  appearanee  in 
fhe  pafhs.  For  example.  Figure  3.7(e)  shows  fhaf  tier-3  links  make  up  17%  of  fhe  boffleneeks  fo 
lier-l  destinations,  bul  aeeounl  for  only  abouf  2%  of  fhe  links  in  Ihese  pafhs. 

Note,  however,  fhaf  fhe  righl  bars  in  Figure  3.7(a)  show  fhe  number  of  unique  boffleneeks  links 
fhaf  we  observed.  Considering  fhe  firsl  sel  of  lefl  and  righl  bars  {i.e.,  all  vs.  unique  boffleneeks 
for  pafhs  fo  lier-l  deslinalions)  in  Figure  3.7(a),  we  noliee  fhaf  Ihere  is  a  signifi  eanl  differenee  in 
fhe  proporlion  of  lier-3  boffleneek  links.  Upon  furlher  examination,  we  diseovered  fhaf  some  of 
fhe  PlanelLab  sifes  were  eonneefed  fo  fhe  Inlernel  via  a  lier-3  ISP.  A  few  of  fhe  ISP’s  links  were 
boffleneeks  for  many  of  fhe  pafhs  leaving  fhe  assoeiafed  PlanelLab  site.  More  generally,  Ihough,  we 
see  in  Figure  3.7(e)  fhaf  lower-lier  infra-ISP  links  seem  fo  be  boffleneeks  more  frequenfly  fhan  we 
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would  expect  based  on  the  appearance  of  these  links  in  the  paths. 

A  similar  examination  of  Figure  3.8(c)  reveals  several  details  about  the  properties  of  bottlenecks 
at  peering  links.  Figure  3.8(c)  shows  that  tier-1  to  tier-1  peering  links  are  bottlenecks  less  frequently 
than  might  be  expected,  given  their  proportion  in  the  overall  paths.  Also,  peering  links  to  or  from 
tier-2,  tier-3  or  tier-4  ISPs  are  bottlenecks  more  frequently  than  expected.  For  example,  compare 
the  proportion  of  tier-2  to  tier-4  peering  bottlenecks  with  the  proportion  of  these  links  in  the  cor¬ 
responding  overall  path  length  (e.g.,  17%  vs.  2%  for  paths  to  tier-1,  and  17%  vs.  4%  for  paths  to 
tier-2). 

Looking  at  Figures  3.7(a)  and  3.8(a)  together,  we  can  observe  some  additional  properties  of  bot¬ 
tleneck  links.  For  example,  total  path  lengths  are  around  8-9  hops  (adding  the  heights  of  the  bars 
in  Figures  3.7(b)  and  3.8(b)),  of  which  only  1-1.5  hops  are  links  between  different  ISPs.  However, 
bottlenecks  for  these  paths  seem  to  be  equally  split  between  intra-ISP  links  and  peering  links  (com¬ 
paring  the  overall  height  of  the  bars  in  Figures  3.7(a)  and  3.8(a)).  This  suggests  that  if  there  is  a 
bottleneck  link  on  a  path,  it  is  equally  likely  to  be  either  in  the  interior  of  an  ISP  or  between  ISPs. 
Given  that  the  number  of  peering  links  traversed  is  much  smaller,  however,  the  likelihood  that  the 
bottleneck  is  actually  at  one  of  the  peering  links  is  higher.  But  the  fact  that  the  bottleneck  on  any 
path  is  equally  likely  to  he  either  inside  an  ISP  or  between  ISPs  is  surprising. 

Another  important  trend  is  that  the  percentage  of  paths  with  an  identifi  ed  bottleneck  link  grows 
as  we  consider  paths  to  lower-tier  destinations.  About  32.5%  of  the  paths  to  tier-1  destinations  have 
bottlenecks.  For  paths  to  tiers  2,  3,  and  4,  the  percentages  are  36%,  50%,  and  54%,  respectively. 
Note  that  while  paths  to  tier-3  appear  to  have  fewer  intra-ISP  bottlenecks  than  paths  to  tier-2,  this 
may  be  because  the  peering  links  traversed  on  tier-3  paths  introduce  a  greater  constraint  on  available 
bandwidth. 

To  summarize,  we  found  that  wide-area  bottlenecks  are  roughly  equally  split  between  intra-ISP 
and  peering  links.  We  also  fi  nd  that  bottlenecks  are  more  prevalent  in  networks  lower  in  the  ISP 
hierarchy. 

3.2.3  Bandwidth  Characterization  of  Bottlenecks 

In  the  previous  section,  we  described  the  location  and  relative  prevalence  of  observed  bottleneck 
links.  In  this  section,  we  discuss  other  properties  of  these  bottlenecks.  Specifi  cally,  we  analyze  the 
available  bandwidth  at  these  bottlenecks,  as  identifi  ed  using  BFind. 

Figure  3.9  illustrates  the  distribution  of  available  bandwidth  of  bottleneck  links  observed  in  dif¬ 
ferent  parts  of  the  network.  The  CDFs  do  not  go  to  100%  because  many  of  the  paths  we  traversed 
had  more  than  50  Mbps  of  available  bandwidth.  Recall  that  BFind  is  limited  to  measuring  bottle¬ 
necks  of  at  most  50  Mbps  due  to  fi  rsf  hop  nefwork  limifafions.  Figure  3.9(a)  shows  fhe  boffleneck 
speeds  we  observed  on  infra-ISP  links.  The  fier-l  and  lier-3  ISP  links  appear  fo  have  a  clear  advan- 
fage  in  ferms  of  boffleneck  bandwidfh  over  lier-2  ISP  bofflenecks.  The  facl  fhaf  fhe  lier-3  bolllenecks 
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(a)  intra-ISP 


(b)  Tier-1  peering 


(e)  Lower  tier  peering 


Figure  3.9:  Available  capacity  at  bottleneck  links:  Graph  (a)  corresponds  to  bottlenecks  within  ISPs.  Graphs 
(b)  and  (c)  show  the  distribution  of  available  capacity  for  bottlenecks  in  peering  links  involving  Tierl  ISPs,  and 
those  in  peering  links  not  involving  Tlerl  ISPs,  respectively.  We  do  not  show  the  distributions  for  bottleneck  links 
between  tiers  2  and  4  and  those  between  tiers  3  and  4  since  they  were  very  small  in  number. 

we  identifi  ed  offer  higher  available  eapaeity  than  tier-2  bottleneeks  was  a  surprising  result.  Links  in 
tier-4  ISPs,  on  the  other  hand,  exhibit  the  most  limited  available  bandwidth  distribution  as  expeeted. 

In  Figures  3.9(b)  and  (e)  we  eonsider  the  distribution  of  bottleneek  bandwidth  on  peering  links. 
Tier-1  to  tier-1  peering  links  are  the  least  eonstrained,  indieating  that  links  between  the  largest  net¬ 
work  providers  are  better  provisioned  when  eompared  to  links  between  lower-tier  networks.  Again, 
we  fi  nd,  surprisingly,  that  tier-2  and  tier-3  links  exhibit  very  similar  eharaeteristies,  in  their  peering 
links  to  tier-1  networks  (Figure  3.9(b)).  Also,  peering  links  between  tier-2  and  tier-3  are  not  sig- 
nifieantly  different  than  tier-2  to  tier-2  links  (Figure  3.9(b)).  We  do  see,  however,  that  bottleneek 
peering  links  involving  networks  low  in  the  hierarehy  provide  signifi  eantly  less  available  eapaeity, 
as  expeeted.  This  is  elearly  illustrated  in  the  bandwidth  distributions  for  tier-1  to  tier-4,  and  tier-3  to 
tier-3  links. 

In  summary,  bottleneeks  involving  tier-1  ISPs  seem  to  be  fairly  well-provisioned  and  have  the 
highest  available  eapaeity.  We  also  found  that  bottleneeks  involving  tier-3  ISPs  had  similar,  if  not 
better,  available  bandwidths  than  those  involving  tier-2  ISPs. 
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(a)  Bottleneck  Links 


(b)  All  Links 


(c)  Relative  proportions:  bottlenecks  vs.  all  links 

Figure  3.10:  Relative  prevalence  of  bottlenecks  of  various  latencies:  Graph  (a)  shows  the  average  nnmber  of 
bottlenecks  of  the  three  classes  of  latencies  further  classifi  ed  into  those  occurring  between  ISPs  and  those  occurring 
inside  ISPs.  Graph  (b)  shows  the  actual  number  of  links  (bottleneck  or  not)  of  each  kind  appearing  in  all  the  paths. 
Graph  (c)  shows  the  relative  fraction  of  bottleneck  links  of  various  latency  types  (left  bar)  and  the  average  path 
composition  of  all  links  (right  bar). 


3.2.4  Latency  Characterization  of  Bottlenecks 

In  this  section,  we  analyze  the  latency  of  bottlenecks.  In  particular,  we  explore  the  correlation 
between  high-latency  links  and  their  relative  likelihood  of  being  bottlenecks.  Figure  3.10  is  similar 
to  Figures  3.7  and  3.8.  Figure  3.10(b)  shows  the  overall  latency  characteristics  of  the  paths.  For 
example,  paths  to  tier-2  destinations  have  an  average  of  5.3  low-latency  intra-ISP,  1.4  low  latency 
peering,  0.6  medium  latency  intra-ISP,  0.1  medium  latency  peering,  1.2  high  latency  intra-ISP,  and 
0.4  high  latency  peering  links.  In  general,  all  path  types  have  a  high  proportion  of  low-latency  hops 
(both  intra-ISP  and  peering)  and  high-latency  intra-ISP  hops.  The  latter  is  indicative  of  a  single 
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long-haul  link  on  average  in  most  of  the  paths  we  measured.  While  high  lateney  peering  links 
would  seem  unlikely,  they  do  oeeur  in  praetiee.  For  example,  one  of  the  PlanetLab  sites  uses  an  ISP 
that  does  not  have  a  PoP  within  its  eity.  As  a  result,  the  link  between  the  site  and  its  ISP,  whieh  is 
eharaeterized  as  a  peering  link,  has  a  lateney  that  exeeeds  15ms. 

In  Figure  3.10(e)  we  illustrate  the  prevalenee  of  bottleneeks  aeeording  to  their  lateney.  We  ean 
observe  that  high-lateney  peering  links  are  mueh  more  likely  to  be  bottleneeks  than  their  appearanee 
in  the  paths  would  indieate.  In  observed  paths  to  tier-2  destinations,  for  example,  these  links  are 
18.5%  of  all  bottleneeks,  yet  they  aeeount  for  only  4%  of  the  links.  This  suggests  that  whenever  a 
high-lateney  peering  link  is  eneountered  in  a  path,  it  is  very  likely  to  be  a  bottleneek.  High  lateney 
intra-ISP  links,  on  the  other  hand,  are  not  overly  likely  to  be  bottleneeks  (e.g.,  11%  of  bottleneeks, 
and  13.5%  of  overall  hops  on  paths  to  tier-2  ). 

In  general.  Figure  3.10  suggests  that  peering  links  have  a  higher  likelihood  of  being  bottleneeks, 
eonsistent  with  our  earlier  results.  This  holds  for  low,  medium,  and  high-lateney  peering  links.  For 
example,  very  few  paths  have  any  medium  lateney  peering  links,  yet  they  aeeount  for  a  signifi  eant 
proportion  of  bottleneeks  in  all  types  of  paths.  Also,  low-lateney  peering  links  on  paths  to  the  lower 
tiers  {i.e.,  tier-3  and  tier-4)  have  a  partieularly  high  likelihood  of  being  bottleneeks,  when  eompared 
to  paths  to  tier-1  and  tier-2  destinations.  Reeall  from  Figures  3.9(b)  and  (e)  that  these  lower-tier 
peering  bottleneeks  also  have  mueh  less  available  bandwidth. 

To  summarize,  we  fi  nd  that  high-lateney  intra-ISP  links  are  not  overly  likely  to  be  bottleneeks. 
The  opposite  is  true  for  high-lateney  peering  links.  We  also  fi  nd  that  peering  links  have  a  higher 
likelihood  of  appearing  as  bottleneeks  on  wide-area  paths. 

3.2.5  Bottlenecks  at  Public  Exchange  Points 

We  now  present  a  study  of  bottleneeks  along  paths  through  publie  exehanges.  As  indieated  in 
Figure  3.11(a),  we  tested  466  paths  through  publie  exehange  points.  Of  the  measured  paths,  170 
(36.5%)  had  a  bottleneek  link.  Of  these,  only  70  bottleneeks  (15%  overall)  were  at  the  exehange 
point.  This  is  in  eontrast  to  the  expeetation  that  many  exehange  point  bottleneeks  would  be  identifi  ed 
on  sueh  paths.  It  is  interesting  to  eonsider,  however,  that  the  probability  that  the  bottleneek  link  is 
loeated  at  the  exehange  is  about  41%  (=  70/170).  In  eontrast.  Figures  3.7(a)  and  3.8(a)  do  not  show 
any  other  type  of  link  (intra-ISP  or  peering)  responsible  for  a  larger  pereentage  of  bottleneeks.^ 
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#Paths  to  exehange  points 

466 

#Paths  with  non-aeeess  bottleneeks 

170 

#Bottleneeks  at  exehange  point 

70 

(a)  Relative  prevalenee  (b)  Available  bandwidth  distribution 

Figure  3.11:  Bottlenecks  in  paths  to  exchange  points:  Table  (a)  on  the  left  shows  the  relative  prevalence  of  bottle¬ 
neck  links  at  the  exchange  points.  Figure  (b)  shows  the  distribution  of  the  available  capacity  for  bottleneck  links 
at  the  exchange  points. 


3.3  Measurement  Caveats,  Summary  of  Observations  and  their  Im¬ 
plications 

The  key  results  from  our  study  are  summarized  in  Table  3.5.  This  study  yields  a  number  of  interest¬ 
ing  and  unexpeeted  fi  ndings  about  the  eharaeteristies  of  wide-area  bottleneek  links.  For  example, 
we  fi  nd  a  substantial  number  of  bottleneek  links  within  ISPs  (nearly  50%  of  the  bottleneeks  we 
found  were  loeated  inside  ISP  networks).  However,  the  eonventional  wisdom  has  been  that  most 
wide-area  bottleneeks  are  eonfi  ned  to  peering  loeations  sinee  there  is  little  or  no  eeonomie  ineentive 
for  ISPs  to  earefully  manage  the  load  on  links  they  share  with  their  neighbors.  In  addition,  we  also 
observed  that  low  lateney  links,  whether  within  ISPs  or  between  them,  ean  also  eonstrain  available 
bandwidth  with  a  small,  yet,  signifi  eant  probability. 

Furthermore,  our  observations  ean  provide  some  guidanee  when  eonsidering  issues  sueh  as 
ehoosing  an  aeeess  ISP  and  optimizing  routes  through  the  network.  In  what  follows,  we  diseuss 
some  of  these  issues  in  the  eontext  of  our  empirieal  fi  ndings.  First,  however,  we  diseuss  some  of  the 
eaveats  of  our  measurement  methodology. 

3.3.1  A  Critique  of  Our  Measurement  Methodology 

We  deseribe  some  possible  shorteomings  of  our  approaeh  here.  To  approximate  the  measurement 
of  “typieal”  paths,  we  ehoose  what  we  believe  to  be  a  representative  set  of  network  paths.  While 

"'However,  in  Figure  3.8(a),  bottlenecks  between  tiers  1  and  3  in  paths  to  tier-3  destinations  are  comparable  to  bottle¬ 
necks  at  exchange  points  in  this  respect. 


46 


Non-access  bottlenecks  are  equally  likely  to  be  links  within  ISPs  or  peering  links  be¬ 
tween  ISPs. 


The  likelihood  of  the  existence  of  a  bottleneck  increases  on  paths  to  lower  tier  ISPs. 


Internal  links  in  lower  tier  ISPs  appear  as  bottlenecks  with  greater  frequency  than  their 
overall  presence  in  typical  paths. 


High-latency  peering  links  are  very  likely  to  be  the  bottlenecks  along  the  paths  they 
appear  in.  High-latency  intra-ISP  links,  on  the  other  hand  are  not  overly  likely  to  be 
bottlenecks. 


Interior  and  peering  bottlenecks  in  tier-2  and  tier-3  ISPs  exhibit  very  similar  available 
capacity. 

hen  a  bottleneck  is  found  on  paths  through  a  public  Internet  exchange,  the  likelihood  of 
bottleneck  actually  lying  at  the  exchange  is  more  than  40%. 


All  paths  have  a  high  proportion  of  low-latency  links  (interior  and  peering)  and  roughly 
one  high-latency  interior  link. 

Table  3.5:  Properties  of  wide-area  Internet  Bottlenecks:  Summary  of  key  observations  regarding  wide-area  bot¬ 
tlenecks. 

the  set  of  paths  is  not  exhaustive,  we  believe  that  they  are  diverse  in  their  location  and  network 
connectivity.  However,  our  test  nodes  are  relatively  USA-centric  (only  3  international  sources  and 
7  destinations)  and  do  not  measure  international  network  connectivity  well. 

Route  changes  could  also  have  a  signifi  cant  impact  on  our  measurements.  If  an  Internet  route 
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changes  frequently,  it  becomes  diffi  cult  for  BFind  to  saturate  a  path  and  detect  a  bottleneck.  Sim¬ 
ilarly,  if  an  AS  uses  multipath  routing,  BFind’s  UDP  probe  traffi  c  and  its  traceroutes  may  take 
different  paths  through  the  network.  As  a  result,  BFind  may  not  detect  any  queuing  delays  despite 
saturating  the  network  with  traffi  c.  If  either  of  these  situations  occurred,  traceroutes  along  the  tested 
path  would  likely  reveal  multiple  possible  routes.  However,  despite  our  continuous  sampling  of 
the  path  with  traceroute  during  a  BFind  test,  we  did  not  observe  either  of  these  routing  problems 
occurring  frequently.  This  is  consistent  with  recent  results  showing  that  most  Internet  paths  tend  to 
be  stable,  even  on  an  hour’s  timescale  [119]. 

The  processing  time  taken  by  routers  to  generate  traceroute  ICMP  responses  can  impact  our 
measurement  of  queuing  delay  and,  therefore,  bottlenecks  in  the  network.  Many  researchers  have 
noted  that  ICMP  error  processing,  typically  done  in  the  router  “slow”  processing  path,  takes  much 
longer  than  packet  forwarding.  In  addition,  some  routers  pace  their  ICMP  processing  in  order 
to  avoid  being  overwhelmed.  Either  of  these  could  cause  the  delays  reported  by  traceroute  to  be 
artifi  cially  inflated.  However,  recent  work  [46]  has  shown  that  slow  path/fast  path  differences  should 
not  affect  traffi  c  measuremenf  fools  in  pracfice  since  fhe  fypically  observed  ICMP  processing  delays 
are  on  fhe  order  of  1-2  ms,  well  wifhin  fhe  timescales  we  need  for  accurate  boffleneck  detection. 

Address  allocation  may  also  skew  our  resulfs.  We  rely  on  using  fhe  address  reporfed  by  routers 
in  fheir  response  fo  fraceroufe  probes  fo  defermine  fheir  ownership.  However,  in  some  peering 
arrangemenfs,  a  roufer  owned  by  an  ISP  is  allocated  an  address  from  fhe  peer  ISP’s  address  space  fo 
make  confi  gurafion  convenienf.  In  such  sifuafions,  our  link  classifi  cation  may  erroneously  idenlify 
fhe  incorrecf  link  (by  one  hop)  as  a  fhe  peering  link  befween  fhe  ISPs.  However,  we  believe  fhaf  fhe 
common  use  of  poinf-fo-poinf  links  in  private  peering  sifuafions  and  separate  address  allocations 
used  in  public  exchanges  (fhese  bofh  eliminate  fhe  above  problem)  reduce  fhe  occurrence  of  Ibis 
problem  signifi  canfly. 

Finally,  we  nofe  fhaf  our  resulfs  represenf  an  empirical  snapshof  of  non-access  Inlernel  boffle- 
necks.  Thai  is,  we  focus  on  collecting  observations  from  a  large  number  of  pafhs,  rafher  fhan  faking 
repeated  measuremenls  of  a  few  pafhs  over  an  extended  period.  While  our  approach  provides  a 
wider  view  of  fhe  characferisfics  and  locafions  of  bofflenecks,  we  cannof  judge,  for  example,  how 
sfable  or  persisfenf  fhe  locafions  are. 

3.3.2  ISPs  and  Provisioning 

Our  measuremenls  show  fhaf  Ihere  is  a  clear  performance  advanlage  fo  using  a  lier-l  ISP.  Our  resulfs 
also  show  fhaf  small  regional  ISPs,  e.g.,  lier-4  providers,  have  relatively  low-speed  conneclivily  fo 
fheir  upslream  ISP,  irrespective  of  fhe  ISP’s  size.  In  addilion,  fheir  nelworks  often  exhibil  perfor¬ 
mance  bofflenecks.  This  may  be  considered  a  refleclion  of  fhe  impacl  of  economics  on  nelwork 
provisioning,  if  we  assume  fhaf  ISPs  lower  in  fhe  AS  hierarchy  are  less  inclined  fo  over-provision 
fheir  nelworks  if  typical  customer  Iraffi  c  volume  does  nol  Ihus  far  require  if.  As  a  resull,  Ihere 
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is  a  clear  disadvantage  to  using  a  tier-4  ISP  for  high-speed  connectivity.  However,  the  trade-offs 
between  tier-2  and  tier-3  networks  are  much  less  clear. 

Paths  to  tier-3  destinations  had  a  larger  percentage  of  bottleneck  links  than  tier-2  paths.  De¬ 
spite  this,  we  also  observed  that  tier-2  and  tier-3  bottlenecks  show  similar  characteristics  in  terms 
of  available  capacity,  with  tier-3  bottlenecks  (both  intra-AS  and  peering  links)  performing  slightly 
better  in  some  cases.  This  might  be  explained  if  we  conjecture  that  tier-2  ASes,  by  virtue  of  their 
higher  degree  of  reachability,  carry  a  larger  volume  of  traffi  c  relative  to  their  capacity,  when  com¬ 
pared  with  tier-3  ASes.  Extending  this  hypothesis,  we  might  conclude  that  if  a  stub  network  desires 
reasonably  wide-spread  connectivity  from  its  ISP,  then  choosing  a  tier-3  ISP  might  be  a  beneli  cial 
choice,  both  economically  and  in  terms  of  performance,  assuming  that  connectivity  to  tier-3  ISPs  is 
less  expensive. 

3.3.3  Route  Selection  for  Improved  Internet  Performance 

Our  measurements  show  that  a  large  fraction  of  Internet  paths  (nearly  50%  in  our  measurements) 
suffer  from  performance  bottlenecks.  However,  the  Internet  has  a  very  rich  topology.  Numerous  al¬ 
ternate  paths  with  a  reasonable  amount  of  available  bandwidth  do  exist  between  arbitrary  endpoints. 
Indeed,  the  remaining  50%  of  the  paths  we  measured  had  an  available  capacity  of  40-50  Mbps  or 
more.  This  is  true  across  most  non-access  links  irrespective  of  their  location  or  latency. 

The  observations  in  this  chapter  naturally  lead  to  the  following  key  question — Is  it  possible  to 
avoid  these  wide-area  bottlenecks,  and  improve  Internet  performance,  by  leveraging  existing  routing 
protocols?  While  there  has  been  considerable  work  on  load-sensitive  routing  of  traffi  c  within  an 
AS,  little  is  known  about  how  to  extend  this  across  ASes.  We  address  this  question  in  the  next  two 
chapters. 
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Chapter  4 


A  Measurement-Based  Analysis  of  Multihoming 


The  Internet  is  eonstituted  by  several  hundreds  of  distinet  ISPs.  In  most  large  eities  of  the  world, 
several  tens  of  ISPs  eompete  to  provide  Internet  eonneetivity  options  to  home  users  and  businesses 
alike.  Apart  from  prieing  and  offered  eonneetivity  speeds,  these  ISPs  differ  from  eaeh  other  in 
one  erueial  aspeet:  The  performanee  and  eonneetivity  provided  by  the  ISPs  to  different  parts  of 
the  Internet  vary  substantially  with  time,  the  destination  involved,  and  the  ehoiee  of  the  ISP  itself. 
For  example,  due  to  variations  in  traffi  e  volumes  in  their  networks  (or  in  their  neighboring  ISPs’ 
networks),  or  due  to  traffi  e  engineering,  ISPs  may  differ  in  the  performanee  they  offer  to  various 
Internet  destinations  over  time.  Similarly,  depending  on  whieh  ISPs  Internet  destinations  are  eon- 
neeted  to,  it  may  be  better  to  use  one  ISP  versus  another  to  communicate  with  a  specifi  c  destination. 

As  we  mentioned  in  Chapter  1,  when  an  end-network,  such  as  an  enterprise,  a  content  provider 
or  a  university  buys  Internet  connectivity  from  a  single  ISP,  it  is  restricted  to  using  the  paths  provided 
by  the  ISP  to  various  Internet  destinations.  In  general,  the  ISP  provides  exactly  one  BGP  path  per 
destination.  Therefore,  if  the  ISP,  or  networks  further  upstream,  face  performance  bottlenecks,  or 
availability  problems,  the  end-network  may  receive  poor  download  speed  and  response  times,  or 
even  stay  disconnected  from  key  Internet  destinations,  for  extended  periods  of  time. 

However,  when  the  end-network  subscribes  to  2  or  3  different  ISPs,  an  approach  popularly 
referred  to  as  multihoming,  it  enjoys  a  modest  improvement  in  the  number  of  available  routes  per 
destination.  Indeed,  each  ISP  provides  one  BGP  path,  yielding  2  or  3  almost  distinct  paths,  per 
destination.  The  end-network  can  then  intelligently  switch  between  the  ISPs,  for  example,  based 
on  the  time  of  the  day,  or  the  Internet  destination,  while  always  using  the  best-performing  ISP  for 
a  given  transfer.  Such  an  approach  to  Internet  connectivity  could  signifi  cantly  improve  the  Internet 
performance  and  reliability  of  the  subscriber  networks.  In  general,  we  will  refer  to  this  approach  as 
multihoming  route  control  (illustrated  in  Figure  4.1(a)).  A  more  technical  defi  nition  follows: 

Multihoming  Route  Control  is  the  technique  in  which  an  end-network  buys  connec¬ 
tions  from  multiple  ISPs  and  intelligently  schedules  its  traffi  c  across  the  ISP  connec¬ 
tions,  so  as  to  improve  Internet  transfer  speeds,  response  times  and  reliability. 

Multihoming  has  been  traditionally  employed  by  large  end-networks  to  achieve  “coarse-grained” 
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Backup 
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AS  Path:  100  100  100 

AS  Num:  100 
Owns  10.0.0.0/18 


(a)  Route  control 


(b)  Primary-backup  mode 


Figure  4.1:  Multihoming:  Figure  (a)  shows  an  example  of  a  route  control  set-up.  Figure  (b)  shows  a  traditional 
multihoming  set-up. 


resilience  from  Internet  service  interruptions.  In  such  a  set-up,  the  multiple  ISP  connections  are  em¬ 
ployed  in  a  “primary -backup”  mode,  where  all  traffi  c  is  moved  over  to  an  available  ISP  link  upon 
failure  of  the  primary  link  (see  Figure  4.1(b)).  However,  the  advent  and  the  expected  growth  of  route 
control  devices  promises  to  allow  subscribers  to  leverage  other  benefi  ts  from  multihoming.  For  ex¬ 
ample,  route  control  products  can  now  be  leveraged  to  optimize  Web  performance,  transfer  speeds, 
bandwidth  and  even  availability  (on  very  fi  ne  time-scales)  across  multiple  ISP  links.  Aside  from 
marking  statements,  however,  little  is  known  about  the  tangible  benefi  ts  end-networks  can  expect 
from  such  products  and  services. 

In  this  chapter,  our  goal  is  to  quantify  the  extent  to  which  subscriber  end-networks  can  ben¬ 
efi  f  from  employing  multihoming  route  control  mechanisms.  We  focus  both  on  improvements  in 
Internet  performance  as  well  as  reliability.  Conceptually,  our  approach  is  to  consider  a  network  sub¬ 
scriber  in  a  major  metropolitan  area,  and  evaluate  the  relative  benefi  ts  of  choosing  upstream  ISPs 
from  several  available  options.  We  assume  that  the  subscriber  has  little  or  no  control  over  end-to- 
end  paths,  but  rather  only  which  ISPs  provide  fi  rsf-hop  connectivity  to  the  Internet,  and  how  the 
subscriber’s  traffi  c  is  scheduled  across  the  ISPs. 

We  collect  and  analyze  several  datasets  in  a  step-by-step  approach  to  quantify  the  performance 
benefi  ts  of  route  control .  First,  we  evaluate  the  improvements  in  Internet  round-trip  time  (RTF) 
performance  from  a  form  of  multihoming  in  which  the  subscriber  has  very  coarse-grained  control 
over  the  ISP  links  used  for  data  transmission.  For  example,  the  subscriber  may  have  to  use  a  single 
ISP  for  all  traffi  c,  over  an  hour’s  period.  We  refer  to  this  as  naive  multihoming.  Then,  we  illustrate 

'Here,  and  in  the  rest  of  the  thesis,  we  use  the  terms  multihoming,  route  control  and  multihoming  route  control 
interchangeably. 
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the  improvements  in  RTT  performanee  from  per-eonneetion  eontrol  over  traffi  e  in  a  set-up  where 
the  subseriber  has  eonneetions  to  two  ISPs.  We  refer  to  this  as  2 -multihoming.  Finally,  we  analyze 
the  more  general  notion  of  k-multihoming,  k  >  2,  where  the  subseriber  is  multihomed  to  k  available 
ISPs  in  a  given  eity.  We  establish  a  baseline  in  whieh  we  assume  that  it  is  possible  for  a  subseriber 
to  employ  all  k  ISPs  and  switeh  to  the  best  performing  ISP  at  eaeh  instant.  By  evaluating  the 
performanee  as  k  is  inereased,  we  provide  insight  into  the  ineremental  benefi  t  from  additional  ISPs. 
We  quantify  both  the  RTT  as  well  as  throughput  improvements  from  fc-multihoming.  In  addition, 
we  analyze  the  impaet  of  the  ehoiee  of  ISPs  and  the  impaet  of  the  time  of  the  day  and  day  of  the 
week  on  the  benefi  ts  from  A:-multihoming.  To  quantify  the  reliability  benefi  fs  of  A:-mulfihoming, 
we  eolleef  fwo  furfher  dafasefs.  The  fi  rsf  dafasef  quanfifi  es  fhe  availabilify  provided  by  mulfihoming 
based  on  aefive  probe  measuremenfs.  We  use  fhe  seeond  dafasef  fo  show  how  mulfihoming  improves 
fhe  availabilify  of  Infernef  pafhs  using  esfimafes  of  availabilify  of  Infernef  roufers. 

Mosf  of  our  dafa  sefs  eomprise  measuremenfs  eonduefed  over  fhe  servers  and  moniforing  nodes 
deployed  by  Akamai  [2],  a  large  eonfenf  disfribufion  serviee  provider.  These  servers  and  monifors 
are  affaehed  fo  a  diverse  sef  of  ISPs  (mosf  nodes  eonneefed  fo  a  single  ISP),  wifh  multiple  Akamai 
servers  loeafed  many  major  mefropolifan  areas.  The  nefwork  performanee  dafa  eolleefed  af  fhese 
Akamai  nodes  allows  us  fo  eompare  performanee  aeross  ISPs  from  fhe  perspeefives  of  mulfihomed 
subseribers  in  differenl  mefropolifan  areas. 

In  order  fo  eompufe  fhe  potential  improvemenfs  from  mulfihoming  route  eonfrol  using  fhe  above 
dafasefs,  we  shall  make  fhree  key  assumpfions  in  Ibis  ehapfer: 

•  The  end-nefwork  has  perfeef  information  abouf  fhe  performanee  and  availability  of  routes  via 
eaeh  of  ifs  ISPs,  whenever  neeessary. 

•  The  end-nefwork  ean  swifeh  befween  eandidafe  ISPs  fo  a  desfinafion  as  often  as  desired. 

•  The  end-nefwork  ean  easily  eonfrol  fhe  ISP  link  fraversed  by  paekefs  desfined  for  ifs  nefwork. 
We  refer  fo  as  “inbound  eonfrol”. 

The  key  observafion  from  our  measuremenfs  is  fhaf,  on  average,  mulfihoming  ean  eonsiderably 
improve  performanee  and  reliabilify  of  end-nefworks.  In  terms  of  performanee,  for  example,  even 
in  a  2-mullihoming  sifuafion,  RTTs  were  improved  by  25%  on  average  for  3  ouf  of  4  mefro  areas  we 
sfudy.  The  improvemenfs  in  fhroughpuf  performanee  were  ~  20%  on  average.  We  also  fi  nd  sfrong 
evidenee  of  diminishing  ineremenfal  performanee  benefi  fs  as  more  ISPs  are  added.  We  observe  fhaf 
inereasing  beyond  k  =  3  provides  little  added  performanee  or  reliabilify.  Comparing  fhe  opfimal 
mulfihoming  solufion  fo  a  random  ehoiee  of  k  ISPs  (for  k  <  3),  we  fi  nd  fhaf  random  seleefion 
degrades  RTT  performanee  by  as  mueh  as  50%.  This  suggesfs  fhaf  a  eareful  ehoiee  of  ISPs  is  key 
fo  aehieving  fhe  potential  benefi  fs  of  mulfihoming. 

Chapter  outline.  We  illustrate  the  performanee  improvements  from  naive  mulfihoming  in  See- 
tion  4.1.  In  Seetion  4.2,  we  illustrate  the  improvements  from  per-eonneetion  eontrol  of  traffi  e 
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with  two  ISP  connections  (2-multihoming).  In  Section  4.3,  we  study  the  improvements  due  to 
A: -multihoming.  We  present  the  reliability  analysis  in  Section  4.4.  Finally,  we  summarize  the  obser¬ 
vations  from  this  study,  and  discuss  their  implications,  in  Section  4.5. 


4.1  Naive  Multihoming:  RTT  Performance 

In  this  section,  we  study  the  improvements  in  Internet  RTT  performance  when  subscribers  employ 
naive  multihoming.  Ideally,  to  realize  the  optimal  benefi  ts  from  route  control,  a  subscriber  may  have 
to  exercise  per-flow  control  over  the  ISP  link  used  (Per-byte  control  would,  in  theory,  be  even  better. 
But  this  form  of  control  may  have  poor  interactions  with  current  TCP  implementations).  In  contrast, 
in  naive  multihoming,  all  of  the  subscriber’s  traffi  c  uses  a  single  ISP  connection  for  a  certain  interval 
of  time  (e.g.,  a  few  hours),  irrespective  of  the  destination.  At  the  end  of  the  interval,  the  subscriber 
may  decide  to  use  a  different  ISP  link  for  the  upcoming  interval. 

4.1.1  Measurement  Dataset 

In  our  analysis  of  naive  multihoming,  we  use  passive  measurement  data  collected  at  server  nodes 
deployed  by  Akamai  across  various  US  cities.  An  important  feature  of  this  data  is  that  the  collection 
points  are  connected  to  a  large  variety  of  ISPs.  Moreover,  there  are  multiple  metropolitan  areas  in 
which  three  or  more  collection  points  are  located,  each  connected  to  a  different  ISP.  We  use  such 
Akamai  servers  in  a  single  metro  area  as  stand-ins  for  a  multihomed  subscriber.  We  use  the  term 
Multihoming  Emulation  to  refer  to  this  procedure. 

The  dataset  we  employ  contains  the  average  HTTP  turnaround  times  for  requests  made  by  Aka¬ 
mai  servers  back  to  various  origin  content  provider  servers  (Figure  4.2).  These  requests  are  typically 
initiated  when  an  Akamai  server  does  not  have  a  valid  object  cached  and  has  to  retrieve  it  from  the 
origin  server.  The  turn-around  time  for  such  HTTP  requests  is  the  time  between  the  transfer  of 
the  last  byte  of  the  request  from  the  Akamai  node  and  the  receipt  of  the  fi  rst  byte  of  the  response 
from  the  origin  server.  Hence,  the  turnaround  time  offers  a  reasonable  estimate  of  network  delay. 
Since  the  customer  content  providers  of  the  CDN  are  large  Web  servers,  we  expect  their  servers  to 
be  well-provisioned,  and  therefore  the  observed  turnaround  time  should  be  constituted  mainly  by 
network  delay  with  almost  no  delay  due  to  the  Web  server  itself. 

The  turnaround  times  we  measure  are  averaged  every  hour  across  requests  sent  to  various  origin 
content  providers.  We  collected  this  data  for  each  hour  over  two  fi  ve-day  periods:  Monday, 
January  2003  to  Friday,  10*^  January  2003  and  Monday,  13*^  January  2003  to  Friday,  17*'*  January, 
2003  (both  inclusive). 
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all  origin  servers 


Figure  4.2:  Naive  Multihoming:  Akamai  servers  connected  to  different  ISPs  in  the  same  city  download  objects 
from  all  customer  origin  servers  in  order  to  serve  them  to  clients.  For  this  data  set,  turnaround  times  are  averaged 
over  each  hour  across  retrievals  from  various  origin  servers. 

4.1.2  Measurement  Results 

We  compute  the  performance  improvements  from  naive  multihoming  as  follows.  For  each  hour,  we 
compare  the  average  turnaround  time  achieved  by  using  the  best  ISP  among  all  those  available  in 
the  city,  with  that  from  using  the  best  ISP  in  a  candidate  multihoming  option  (i.e.,  a  given  subset  of 
ISPs).  We  average  this  ratio  over  all  hours,  and  report  the  minimum  normalized  performance  metric 
(the  minimum  is  taken  over  all  possible  candidate  options). 

Formally,  we  compute  the  performance  benefi  ts  from  a  given  naive  multihoming  option  consist¬ 
ing  of  k  ISPs  (denoted  by  OPk)  as  follows: 

Numvalid{t) 


where  Nqp^  is  the  performance  of  using  the  fc-multihoming  option  OPk  in  a  given  city,  relative  to 
the  performance  of  using  all  available  ISPs.  HToPf^  (t)  denotes  the  best  average  turnaround  time 
performance  among  the  k  ISPs  in  the  set  OPk  at  hour  t.  HThesf{t)  is  the  best  average  turnaround 
time  performance  at  hour  t  over  all  the  available  carriers.  The  sum  in  the  numerator  is  taken  over 
all  hours  t  for  which  all  the  k  ISPs  have  the  average  turnaround  time  statistics  logged  in  the  data  set 
T-Li.  Numvalid{t)  counts  the  number  of  such  instances  t.  For  a  very  small  fraction  of  the  hours, 
the  average  turnaround  time  data  was  unavailable  for  certain  networks. 
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Figure  4.3:  Naive  /c-multihoming:  Figure  (a)  shows  the  1 -multihoming  performance  of  the  ISPs  in  each  city,  with 
ISPs  ranked  according  to  their  performance.  Figure  (h)  shows  the  diminishing  returns  from  /c -multihoming  in 
each  city. 


In  Figure  4.3(a)  we  plot  the  performanee  metrie  Nqpi  for  ISPs  in  a  eity  against  their  ranks  (The 
ISP  with  rank  1  is  the  best  in  the  eity).  The  graph  shows  the  fi  rst  week  of  data;  the  seeond  week 
is  very  similar.  Notiee  that  in  some  of  the  eities,  there  are  a  few  ISPs  that  give  signifi  eantly  better 
performanee  than  the  others.  For  example,  the  best  ISP  in  Seattle  provides  at  least  7  times  better 
performanee  as  any  other  ISP.  There  are  also  eities  in  whieh  many  ISPs  provide  similar  performanee. 

From  Figure  4.3(a)  it  is  apparent  that,  in  some  eities,  there  were  in  exeess  of  50  ISPs  (e.g., 
San  Franeiseo).  Evaluating  all  the  options  for  fc-multihoming  to  determine  the  best  naive  k- 
multihoming  option  is  eomputationally  expensive.  We  reduee  the  amount  of  eomputation  by  evalu¬ 
ating  A:-multihoming  options  against  the  performanee  of  up  to  20  top  ISPs  in  eaeh  eity  (ehosen  based 
on  their  1-multihoming  performanee).  This  has  a  negligible  impaet  on  our  results,  as  our  analysis 
showed  that  the  performanee  of  the  top  20  ISPs  is  virtually  indistinguishable  from  the  performanee 
using  all  available  ISPs  in  the  eity  (these  results  are  omitted). 

In  Figure  4.3(b),  we  show  the  potential  performanee  benefi  ts  from  naive  fc-multihoming  for  the 
fi  rst  week  of  data  in  (again,  the  seeond  week’s  results  are  similar).  Notiee  that  A:  >  1  provides 
signifi  eantly  better  performanee  than  1 -multihoming  in  most  loeations.  For  a  few  eities,  however, 
the  performanee  benefi  f  is  nof  as  subsfanfial  due  fo  a  single  ISP  providing  fhe  besf  performanee 
almosf  all  fhe  lime  (e.g.,  Los  Angeles).  Also,  beyond  k  =  3  fhe  benefi  1  from  naive  A:-mullihoming 
is  only  marginally  heifer  lhan  al  smaller  values  of  k  for  mosl  eilies. 

Table  4.1  shows  fhe  order  in  whieh  ISPs  gel  added  lo  fhe  fc-mullihoming  solution  in  New  York 
for  inereasing  values  of  k.  For  eaeh  ISP,  we  also  show  ifs  1-mulfihoming  rank  and  performanee. 
Notiee  lhal  fhe  besf  A:-multihoming  solulion  does  nof  neeessarily  eomprise  fhe  k  besf  1-mullihoming 
oplions  (e.g.,  fhe  Ihird  ISP  has  a  rank  of  9).  Ralher,  ISPs  are  added  based  on  Iheir  eonlribulion  fo 
fhe  overall  A:-mullihoming  performanee. 

We  also  eonsider  how  oflen  eaeh  of  fhe  ISPs  is  employed  in  fhe  oplimal  sehedule  for  naive 
mullihoming.  In  parlieular,  we  are  inleresfed  in  whelher  an  ISP’s  eonlribulion  loward  performanee 
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ISP 

Rank 

l-mulfihoming 

performance 

A:-multihoming 

performance 

ISP  1 

1 

1.72 

1.72 

ISP  2 

2 

1.93 

1.33 

ISP  3 

9 

2.61 

1.17 

ISP  4 

3 

2.05 

1.09 

ISP  5 

4 

2.29 

1.07 

ISP  6 

19 

3.16 

1.04 

ISP  7 

17 

3.03 

1.03 

ISPS 

13 

2.93 

1.03 

Table  4.1:  Rank  vs  overall  performance  Ranks  of  the  ISPs  in  the  /c-multihoming  solutions  at  New  York,  k  <  8,  in 
the  order  in  which  they  are  added,  along  with  the  incremental  performance  improvement. 
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(a)  Boston 

Figure  4.4:  Relative  utilization  of  ISPs:  For  the  cities  of  Boston  and  New  York,  respectively,  the  graphs  show  the 
fraction  of  time  the  ISPs  in  the  naive  /c-multihoming  solutions  at  the  city  are  utilized  in  the  optimal  schedule. 

improvement  is  proportional  to  the  frequeney  with  whieh  it  is  used  in  the  optimal  sehedule.  The 
results  for  two  cities,  Boston  and  New  York,  are  illustrated  in  Figure  4.4.  The  results  show  clearly 
that  that  the  contribution  to  performance  is  not  proportional  to  the  usage.  For  example,  the  6th  ISP 
in  New  York  is  used  for  a  signifi  cant  fraction  of  time  in  the  6-multihoming  solution  (Figure  4.4(b)). 
However,  the  marginal  benefi  t  of  adding  this  ISP  to  the  5 -multihoming  solution  was  less  than  0.02 
(Figure  4.3(b)).  It  is  also  possible  that  an  ISP  belonging  to  the  best  fc-multihoming  solution  is 
utilized  for  a  very  small  fraction  of  time  in  the  optimal  schedule,  but,  whenever  used,  contributes 
signifi  cantly  to  improving  the  overall  performance.  For  example,  ISPl  is  used  for  smaller  fraction 
of  time  than  ISP2  for  the  best  naive  2-multihoming  solution  in  Boston  (Figure  4.4(a)).  However,  the 
contribution  of  ISPl  to  the  overall  benefi  f  due  fo  2-mulfihoming  is  clearly  larger  fhan  fhaf  of  ISP2. 
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To  summarize,  even  naive  multihoming  ean  substantially  improve  the  turnaround  time  perfor- 
manee  of  subseriber  networks.  We  also  fi  nd  that  the  eontribution  of  an  ISP  to  the  overall  improve¬ 
ment  is  not  proportional  to  how  often  it  is  used  in  the  optimal  sehedule.  Finally,  the  best  set  of  ISPs 
does  not  neeessarily  inelude  the  individual  best  ISPs.  We  explore  this  issue  further  in  Seetion  4.3. 

4.2  2 -Multihoming:  RTT  Performance 

In  the  previous  seetion,  we  illustrated  how  naive  multihoming  eould  improve  the  average  RTT  per- 
formanee  of  a  subseriber  network.  While  useful  to  illustrate  the  RTT  benefi  ts  of  multihoming,  this 
analysis  does  not  highlight  the  maximum  possible  performanee  improvements  due  multihoming. 
This  arises  from  a  few  key  limitations  of  the  dataset:  the  RTT  performanee  data  is  averaged  aeross 
many  eontent  providers  over  the  duration  of  an  hour;  A  fi  ner-grained  eontrol  over  the  use  of  ISP  links 
eould  further  improve  the  RTT  performanee.  Moreover,  multihoming  eould  also  improve  transfer 
speeds  signifi  eantly,  and  provide  quiek  failover.  We  need  new  datasets  to  analyze  these  benefi  fs  of 
multihoming.  In  fhis,  and  fhe  subsequenf  seefions,  we  analyze  active  measuremenf  dafa  eolleefed  af 
seleefed  Akamai  server  and  monitoring  nodes  to  address  fhe  above  drawbaeks.  Firsf,  in  Ibis  seetion, 
we  eonsider  a  selling  in  whieh  fhe  end-nelwork  has  Iwo  ISP  eonneelions.  We  illuslrafe  fhe  RTT 
benefi  Is  from  inlelligenlly  eonfrolling  whieh  of  fhe  Iwo  ISPs  is  used,  al  mueh  fi  ner  time  granularily 
(e.g.,  on  fhe  order  of  a  few  minufes).  We  refer  to  fhis  as  2 -multihoming. 

4.2.1  Measurement  Dataset 

To  analyze  2-muhihoming,  we  eolleel  RTT  performanee  slalislies  al  27  geographieally  dislribuled 
Akamai  monitoring  nodes.  One  or  Iwo  monitoring  nodes  are  loealed  in  major  eilies  in  fhe  U.S.,  wilh 
multiple  nodes  in  fhe  same  eily  alfaehed  to  differenl  upslream  ISPs,  as  shown  in  Figure  4.5.  Every  6 
minufes,  Ihese  nodes  download  designafed  objeefs  direelly  from  a  large  number  of  eonlenf  providers 
lhal  are  Akamai  euslomers.  For  eaeh  aflempled  download,  fhe  performanee  monilor  logs  a  number 
of  slalislies,  ineluding  Ihe  HTTP  response  eode,  lurnaround  time  for  Ihe  requesl  (if  sueeessful),  Ihe 
size  of  Ihe  objeel  downloaded,  Ihe  lolal  response  time,  and  any  errors  (if  unsueeessful). 

We  foeus,  in  parlieular,  on  Ihe  turnaround  time,  whieh  is  deli  ned  as  Ihe  time  belween  Ihe  Iransfer 
of  Ihe  Iasi  byle  of  Ihe  requesl  from  Ihe  Akamai  node  and  Ihe  reeeipl  of  Ihe  fi  rsl  byle  of  Ihe  response 
from  Ihe  origin  server.  Henee,  Ihe  lurnaround  time  offers  a  reasonable  estimate  of  nelwork  delay. 
We  eolleeled  Ihese  slalislies  al  all  27  performanee  monitors  for  downloads  made  from  aboul  80 
eonlenl  providers  whieh  are  euslomers  of  Akamai.  The  dala  was  eolleeled  belween  Thursday,  23'^'^ 
January,  2003  and  Sunday,  26**  January,  2003  (inelusive).  Of  Ihe  80  eonlenl  providers,  20  are  Ihe 
top  euslomers  of  Akamai;  lhal  is,  Ihose  for  whieh  Ihe  Akamai  nelwork  serves  Ihe  largesl  number  of 
bytes. 
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selected  content  providers 


Figure  4.5:  2-Multihoming:  Akamai  performance  monitors  in  a  given  city  are  connected  to  different  ISPs  and 
download  10KB  objects  at  6-minute  intervals  from  servers  belonging  to  80  content  providers. 

4.2.2  Measurement  Results:  2-Multihoming 

We  use  the  measurements  in  the  above  dataset  to  compare  the  performance  achieved  by  using  the 
best  ISP  link  for  each  download,  relative  to  that  of  using  a  single  ISP  for  all  downloads.  We  average 
this  ratio  over  downloads  from  all  of  the  content  providers  and  report  this  normalized  performance 
metric.  We  must  also  be  careful  to  compare  only  those  transactions  for  which  both  performance 
monitors  successfully  downloaded  the  object  at  roughly  the  same  time.  We  select  cities  in  the  U.S. 
with  2  performance  monitors,  giving  us  four  locations:  Atlanta,  Chicago,  Dallas  and  New  York.  The 
rest  of  the  cities  have  only  one  performance  monitor.  The  monitor  nodes  in  these  cities  can  then  be 
used  to  measure  the  benefi  ts  of  2-multihoming  employing  the  respective  ISPs.  More  formally,  the 
computation  may  be  expressed  as: 


Nx 


Numvalid{Pi,t) 


where  Nx  is  the  performance  of  using  ISP  X,  relative  to  2-multihoming.  Mx{Pi,t)  denotes  the 
value  of  the  turnaround  time  for  the  transfer  initiated  at  time  t  by  the  monitor  node  attached  to  ISP 
X  to  retrieve  an  object  from  content  provider  Pi.  Similarly,  Mi,est{Pi,t)  is  the  best  value  (across 
both  ISPs)  of  the  turnaround  time  for  a  transfer  to  the  same  city  at  time  t  from  content  provider 
Pi-  The  sum  in  the  numerator  is  over  all  Pj,  f  pairs  such  that  there  was  a  transfer  logged  at  time 
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(a)  Potential  benefi  t  (b)  Variability  of  benefi  ts 


(c)  Relative  usage  of  the  ISPs 


Figure  4.6:  2-multihoming  evaluation:  The  average  henefl  ts  are  shown  in  (a).  Graph  (b)  shows  the  median,  10th 
and  90th  percentile  turnaround  times  for  each  ISP  and  for  2-multihoming.  The  relative  usage  of  the  two  ISPs  in 
the  optimal  schednle  is  shown  in  (c). 

t  to  eontent  provider  Pi  via  both  the  ISPs  A  and  B.  Numvalid{Pi,t)  is  a  funetion  that  simply 
eounts  the  total  number  of  sueh  Pi  A  pairs.  Notiee  that  the  optimal  value  of  Nx  is  1  and  this  oeeurs 
whenever  one  of  the  two  ISPs  is  eonsistently  better  than  the  other.  If  Nx  >  1,  then  Nx  —  1  denotes 
the  maximum  improvement  in  performanee  possible  from  multihoming  to  both  the  ISPs  {i.e.,  from 
2-multihoming),  eompared  to  the  performanee  seen  while  using  ISP  X  alone. 

The  results  for  the  performanee  benefi  ts  from  2-multihoming  are  shown  in  Figure  4.6(a),  whieh 
indieates  the  value  of  Nx  for  the  two  ISPs.  The  ISPs  in  the  four  eities  were  tier-1  ISPs.  In  all  four 
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cities,  2-multihoming  clearly  offers  performance  benefi  ts,  albeit  to  varying  degrees.  For  example, 
Chicago’s  ISPl  provides  nearly  optimal  performance  by  itself  {Njspi  —  1.09).  However,  in  the 
other  cities,  the  minimum  performance  benefi  t  from  2-multihoming  is  at  least  25%  on  average. 

Figure  4.6(b)  illustrates  the  absolute  RTT  improvement  for  the  median,  10th,  and  90th  percentile 
turnaround  time.  Note  that  2-multihoming  uniformly  improves  the  maximum  turnaround  times,  but 
has  less  effect  on  the  median  and  minimum  performance.  Also,  the  extent  of  the  improvement 
varies  across  cities.  Figure  4.6(c)  shows  the  fraction  of  time  when  one  of  the  two  ISPs  provides 
better  performance  than  the  other.  Except  in  Chicago  where  ISPl  is  used  almost  90%  of  the  time, 
both  the  ISPs  in  the  other  cities  are  used  for  roughly  equally  in  the  optimal  schedule. 

To  summarize,  using  the  example  of  2-muhihoming,  we  showed  that  a  fi  ne-grained  control  over 
ISP  connections  can  result  is  signifi  cant  improvements  for  the  subscriber  network.  Again,  the  two 
ISP  connections  may  not  be  used  equally  in  the  optimal  schedule  for  traffi  c. 

4.3  /c -Multihoming,  k  >2 

The  previous  two  sections  provided  examples  of  the  benefi  fs  of  using  multiple  ISP  connecfions. 
However,  due  fo  limifafions  of  fhe  dafasefs,  a  number  of  more  general  questions  still  remain  unan¬ 
swered.  For  example:  (1)  How  does  mulfihoming  improve  Iransfers  speeds?  (2)  Do  fhe  performance 
improvemenfs  depend  on  fhe  desfinafion,  lime-of-day  or  fhe  day-of-week?  And,  (3)  How  much  im- 
pacf  does  fhe  exacf  choice,  or  number,  of  fhe  ISPs  have  on  fhe  subscriber  performance? 

In  fhis  section,  we  address  fhe  above  quesfions  via  comprehensive  acfive  measuremenfs  of  RTT 
and  fhroughpuf  performance  faken  over  a  large  feslbed  consisting  of  nodes  belonging  fo  fhe  server 
infraslruclure  of  fhe  Akamai  CDN.  Following  a  similar  mefhodology  fo  fhaf  described  in  4.1,  we 
emulate  a  mulfihoming  scenario  by  selecting  a  few  nodes  in  a  mefropolifan  area,  each  singly-homed 
fo  a  differenl  ISP,  and  use  fhem  collectively  as  a  sfand-in  for  a  mulfihomed  nefwork. 

The  68  nodes  in  our  feslbed  span  17  cifies  in  fhe  confinenfal  U.S.,  averaging  abouf  four  nodes 
per  cify,  connected  fo  commercial  ISPs  of  various  sizes.  As  before,  fhe  nodes  are  chosen  fo  avoid 
muhiple  servers  affached  fo  fhe  same  ISP  in  a  given  cify.  The  lisl  of  cities  and  fhe  tiers  of  fhe 
corresponding  ISPs  are  shown  in  Figure  4.7(a).  The  tiers  of  fhe  ISPs  are  derived  from  fhe  work 
in  [108].  The  geographic  disfribufion  of  fhe  feslbed  nodes  is  illuslraled  in  Figure  4.7(b).  We  emulale 
mulfihomed  nelworks  in  9  of  fhe  17  mefropolifan  areas  where  Ihere  are  al  leasl  3  ISPs  -  Allanla, 
Bay  Area,  Boslon,  Chicago,  Dallas,  Los  Angeles,  New  York,  Seallle  and  Washinglon  D.C. 

In  whal  follows,  we  fi  rsl  describe  our  dala  collection  mefhodology.  Then,  we  presenl  fhe  key 
measuremenl  observations  in  fhe  following  order.  Firsl,  we  presenl  fhe  improvemenfs  in  RTT  and 
fhroughpuf  performance  from  using  A:-muhihoming,  k  >  2.  Second,  we  explore  whelher  fhe  im- 
provemenls  due  fo  mulfihoming  are  skewed  by  cerfain  deslinalions,  time  of  fhe  day  or  day  of  fhe 
week.  Finally,  we  explore  fhe  impacl  of  a  subopfimal  choice  of  ISPs  on  fhe  observed  subscriber 
performance. 
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City 

ISPs/tier 

1 

2 

3 

4 

5 

Atlanta,  GA 

2 

0 

1 

1 

0 

Bay  Area,  CA 

5 

0 

3 

1 

2 

Boston,  MA 

1 

0 

1 

0 

1 

Chicago,  IL 

6 

1 

0 

1 

0 

Columbus,  OH 

0 

1 

0 

1 

0 

Dallas,  TX 

3 

0 

0 

1 

0 

Denver,  CO 

1 

0 

0 

0 

0 

Des  Moines,  10 

0 

1 

0 

0 

0 

Houston,  TX 

1 

1 

0 

0 

0 

Los  Angeles,  CA 

3 

0 

3 

0 

0 

Miami,  FL 

1 

0 

0 

0 

0 

Minneapolis,  MN 

0 

0 

1 

0 

0 

New  York,  NY 

3 

2 

2 

1 

0 

Seattle,  WA 

2 

0 

2 

1 

1 

St  Louis,  MO 

1 

0 

0 

0 

0 

Tampa,  FL 

0 

1 

0 

0 

0 

Washington  DC 

3 

0 

3 

0 

2 

(a)  Testbed  ISPs 


(b)  Node  locations 


Figure  4.7:  Testbed  details:  The  cities  and  distribution  of  ISP  tiers  in  our  measurement  testbed  are  listed  in  (a). 
The  geographic  location  is  shown  in  (b).  The  area  of  each  dot  is  proportional  to  the  number  of  nodes  in  the  region. 
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4.3.1  Data  Collection 


We  draw  our  observations  from  two  datasets  eolleeted  on  the  testbed  deseribed  above.  The  first 
data  set  eonsists  of  aetive  HTTP  downloads  of  small  objeets  (10  KB)  to  measure  the  turnaround 
times  between  the  pairs  of  nodes.  Every  6  minutes,  we  eolleet  turnaround  time  samples  between  all 
pairs  of  nodes  in  our  testbed  (ineluding  those  within  the  same  eity).  The  seeond  data  set  eontains 
throughput  measurements  from  aetive  downloads  of  1  MB  objeets  between  the  same  set  of  node¬ 
pairs.  These  downloads  oeeur  every  30  minutes  between  all  node-pairs.  Throughput  is  simply  the 
size  of  the  transfer  (1  MB)  divided  by  the  time  between  the  reeeipt  of  the  first  and  last  bytes  of 
the  response  data  from  the  server  (souree).  Notiee  that  this  may  not  refleet  the  steady-state  TCP 
throughput  along  the  path. 

Sinee  our  testbed  nodes  are  part  of  Akamai’s  produetion  infrastrueture,  we  limit  the  frequeneies 
at  whieh  all-pairs  measurements  are  eolleeted  as  deseribed  above.  To  ensure  that  all  aetive  probes 
between  pairs  of  nodes  observe  similar  network  eonditions,  we  seheduled  them  to  oeeur  within  a 
30  seeond  interval  for  the  round-trip  time  data  set,  and  within  a  2  minute  interval  for  the  throughput 
data  set.  For  the  latter,  we  also  ensure  that  an  individual  node  is  involved  in  at  most  one  transfer 
at  any  time  so  that  our  probes  do  not  eontend  for  bandwidth  at  the  souree  or  destination  network. 
The  transfers  may  interfere  elsewhere  in  the  Internet,  however.  Also,  sinee  our  testbed  nodes  are  all 
loeated  in  the  U.S.,  the  routes  we  probe,  and  eonsequently,  our  observations,  are  U.S.-eentrie. 

The  round-trip  time  data  set  was  eolleeted  from  Thursday,  Deeemher  4th,  2003  through  Wednes¬ 
day,  Deeemher  10th,  2003.  The  throughput  measurements  were  eolleeted  between  Thursday,  May 
6th,  2004  and  Tuesday,  May  1 1th,  2004  (both  days  inelusive). 

4.3.2  A:-Multihoming  Improvements 

To  understand  performanee  benefi  ts  of  A:-multihoming,  we  adopt  a  similar  methodology  as  the  one 
deseribed  in  Seetion  4.2.  For  eaeh  download,  we  eompare  the  turnaround  time  aehieved  by  using 
the  best  ISP  among  all  those  available  in  the  eity,  with  that  of  using  the  best  ISP  in  a  eandidate 
multihoming  option.  We  average  this  ratio  over  transfers  to  all  testbed  maehines,  and  report  the 
minimum  normalized  performanee  metrie.  The  minimum  is  taken  over  all  multihoming  options.  As 
before,  we  only  eompare  transaetions  with  simultaneous  sueeessful  transfers  over  all  ISPs. 

Formally,  denotes  the  best  value  of  the  turnaround  time  for  a  transfer  to  testhed 

node  Aj  («  =  1, . . . ,  68)  at  time  t,  aeross  all  available  ISPs  in  a  eity.  For  a  A: -multihoming  option 
OPk,  let  MoPy,  {Ai,t)  be  the  best  turnaround  time  aeross  just  the  ISPs  in  the  set  OPk-  We  eompute 
the  RTT  performanee  benefi  f  from  fhe  option  OI%  as  follows: 


RTTop^ 


^i,t{A^OPk  {Ai-,  t) / M}}gst{Aii  t)) 

Numvalid{t) 
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Figure  4.8:  /c-Multihoming  Benefits:  Figure  (a)  plots  the  improvement  in  web  turnaround  times  from  k- 
multihoming.  Figure  (b)  plots  the  improvements  in  throughput. 


The  sum  is  over  the  times  t  when  transfers  oeeur  from  all  the  ISPs  in  the  eity  to  node  Ai. 
Numvalid{t)  is  the  number  of  sueh  time  instanees.  We  eompute  the  throughput  benefi  ts  in  a 
similar  fashion: 


Thruop,, 


t)/AIoP],  {Ai-,  t)) 
Numvalid{t) 


The  slight  differenee  in  the  defi  nition  of  the  RTT  and  throughput  metries  arises  from  the  faet 
that  we  are  interested  in  how  multihoming  ean  provide  lower  RTTs  and  higher  transfers  speeds. 

In  Figure  4.8,  we  plot  the  normalized  RTT  and  throughput  benefits  from  fc-multihoming  as  a 
funetion  of  the  number  of  ISPs.  Two  key  faets  are  apparent  from  Figure  4.8(a): 

•  The  average  RTT  improves  dramatieally  when  the  subseriber  uses  the  best  set  of  2  or  more 
ISPs,  relative  to  using  the  single  best  ISP.  The  normalized  performanee  metrie  is  lowered 
by  0.4,  refleeting  an  average  25%  improvement  in  RTTs.  Intuitively,  this  oeeurs  beeause  a 
seeond,  well-ehosen  ISP  eould  potentially  double  the  diversity  in  paths  to  various  destinations. 
This  improved  ehoiee  in  paths  helps  end  networks  avoid  serious  performanee  problems  along 
any  single  ISP’s  paths. 

•  There  is  a  strong  evidenee  of  diminishing  returns.  Beyond  3  or  4  ISPs,  the  marginal  benefi  fs 
from  using  addifional  ISPs  is  small.  Again,  infuifively,  fhis  oeeurs  beeause  a  fourlh  or  a  fi  flh 
ISP  ean  provide  very  hide  addifional  diversify  lhan  whal  a  well-ehosen  sef  of  3  ISPs  already 
provides. 

In  ferms  of  fhroughpuf,  mulfihoming  improves  performanee  by  as  mueh  as  20%  relative  fo  a 
single  ISP  (see  Figure  4.8(b)).  Again,  we  nofiee  diminishing  refurns  in  performanee  beyond  3  ISPs. 
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Figure  4.9:  Per-destination  performance:  Figures  (a)  and  (b)  plot  the  absolute  improvements  in  RTT  and  through¬ 
put  performance,  respectively,  from  3-multihoming  relative  to  1-multihoming. 


4.3.3  Unrolling  the  Averages 

So  far,  we  presented  averages  of  the  performance  improvements  from  multihoming  route  control.  In 
this  section,  we  present  the  underlying  distributions  in  the  performance  improvements.  Our  goal  is  to 
understand  if  the  averages  are  particularly  skewed  by:  (1)  certain  destinations,  for  each  source  city 
or  (2)  a  few  measurement  samples  on  which  multihoming  offers  signifi  cantly  better  performance 
than  a  single  ISP  or  (3)  by  time-of-day  or  day-of-week  effects. 

Performance  per  destination.  In  Figure  4.9(a),  we  show  the  distribution  of  the  average  difference 
between  the  best  3-muhihoming  path  and  the  path  via  the  single  best  ISP  (i.e.,  each  point  represents 
one  destination)  for  various  cities.  To  illustrate,  for  a  subscriber  in  Seattle,  3-muhihoming  improves 
the  average  RTT  per  destination  by  more  than  10ms  for  about  60%  of  the  destinations,  and  more 
than  15ms  for  about  30%  of  the  destinations.  In  Los  Angeles,  the  improvement  due  to  multihoming 
is  less  dramatic.  For  about  60%  of  the  destinations,  the  improvement  in  the  average  RTT  due  to 
multihoming  is  under  5ms.  The  key  point  to  notice,  however,  is  that  for  the  9  cities  we  consider,  there 
exist  a  few  destinations  to  which  multihoming  can  offer  signifi  cantly  improved  RTT  performance. 

In  Figure  4.9(b),  we  consider  the  distribution  of  the  average  throughput  difference  of  the  best  3- 
multihoming  path  and  the  best  single  ISP.  We  see  that  the  throughput  difference  is  more  than  3  Mbps 
for  15^0%  of  the  destinations.  We  also  note  that,  for  1-10%  of  the  destinations,  the  difference  is 
in  excess  of  8  Mbps.  As  with  RTT,  these  observations  imply  that  the  transfer  speeds  to  certain 
destinations  could  be  substantially  higher  when  the  subscriber  is  multihomed. 

Mean  versus  other  statistics.  In  Figures  4.10(a)  and  (b)  we  plot  the  average,  median,  and  10th 
and  90th  percentiles  of  the  difference  in  RTT  and  throughput,  respectively,  between  3-muhihoming 
and  l-multihoming.  In  Figure  4.10(a)  we  see  that  the  median  RTT  difference  is  fairly  small.  More 
than  90%  of  the  median  RTT  differences  are  less  than  10ms.  However,  the  90th  percentile  of  the 
difference  is  much  higher  with  roughly  25%  greater  than  20ms.  The  90th  percentile  throughput 
differences  in  Figure  4.10(b)  are  also  signifi  cant  -  more  than  8  Mbps  about  25%  of  the  time.  Con- 
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Figure  4.10:  Underlying  distributions:  Figure  showing  the  mean,  median,  10th  percentile  and  90th  percentile  dif¬ 
ference  across  various  source-destination  pairs.  Figure  (a)  plots  RTT,  while  fi  gure  (b)  plots  throughput  (pessimistic 
estimate). 


Figure  4.11:  Time  of  day  effects:  Figures  plotting  the  impact  of  the  time-of-day  (Figure  (a))  and  day-of-week 
(pessimistic.  Figure  (b))  on  RTT  performance.  All  times  are  in  EDT. 

sidering  the  median  throughput  differenees,  we  see  that  a  signifi  eant  fraetion  (about  20%)  are  greater 
than  3  Mbps.  These  observations  suggest  that  while  multihoming  improves  the  overall  performanee 
of  all  transfers  by  modest  amounts,  the  performanee  of  a  small  yet  signifi  eant  fraetion  eould  be 
improved  signifi  eanfly  by  earefully  seheduling  fraffi  e  aeross  ISP  links. 

Time-of-day  and  day-of-week  effects.  We  also  eonsider  fhe  effeels  of  hourly  and  daily  nefwork 
usage  patterns  on  fhe  relafive  performanee  of  3-mullihoming  and  l-mulfihoming.  If  mighf  be  ex¬ 
pected  fhaf  l-mulfihoming  would  perform  particularly  worse  during  peak  periods.  In  Figure  4.1 1(a) 
we  examine  lime-of-day  effeels  on  fhe  average  difference  in  round-lrip  times.  Notice  lhal  fhe  RTT 
performance  improvemenl  does  show  a  correlation  wifh  fhe  lime  of  fhe  day.  While  fhe  improvemenl 
due  lo  careful  roule  seleclion  is  minimal  in  fhe  evenings  and  weekends,  fhe  differences  are  more  pro¬ 
nounced  during  fhe  remaining  lime-periods.  We  also  examine  daily  patterns  lo  determine  whelher 
Ihe  differences  are  greater  during  particular  days  of  Ihe  week  (Figure  4.11(b)).  The  correlation  be- 
Iween  Ihe  performance  improvemenls  and  Ihe  days  of  Ihe  week  is  nol  as  signifi  cant  As  expected, 
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Figure  4.12:  Impact  of  sub-optimal  choices:  Graph  (a)  shows  the  expected  RTT  performance  metric  from  a  ran¬ 
dom  A; -multihoming  option.  Graph  (b)  shows  the  performance  of  the  worst  A; -multihoming  option. 
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Figure  4.13:  Choice  of  ISPs:  Figures  (a)  and  (b)  show  the  RTT  performance  from  various  ISP  selection  policies 
for  San  Francisco  and  Los  Angeles,  respectively. 

wc  observe  the  improvements  to  be  inferior  during  weekends.  However,  the  improvements  for  the 
other  days  of  the  week  are  not  substantially  different. 

4.3.4  Impact  of  the  Choice  of  ISPs 

In  Figure  4.12,  we  illustrate  the  impact  of  choosing  a  sub-optimal  set  of  ISPs  for  a  A:-multihoming 
solution.  We  assume  that,  given  a  choice  of  ISPs,  a  subscriber  always  uses  the  best  ISP  among 
the  available  set  for  its  transfers.  Comparing  Figures  4.12  and  (a)  4.8(a),  for  A:  <  4,  we  see  that 
the  RTT  performance  metric  due  a  random  choice  of  k  ISPs  is  more  than  50%  higher  (e.g.,  k  —  2 
for  Chicago).  The  difference  between  optimal  and  random  choices  of  ISPs  is  substantial  even  for 
higher  values  of  k.  In  Figure  4.12(b)  we  show  the  performance  of  the  worst  fc-multihoming  option. 
A  poor  choice  of  upstream  ISPs  could  result  in  performance  that  is  at  least  twice  as  bad  as  the 
optimal  choice  (compare,  e.g..  A:  =  2  for  Chicago,  in  Figures  4.12(b)  and  4.8(a)).  Therefore,  while 
multihoming  offers  potential  for  signifi  cant  performance  benefi  ts,  it  is  crucial  to  carefully  choose 
the  right  set  of  ISPs. 
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Finally,  we  explore  the  relative  RTT  performanee  from  various  strategies  for  seleeting  ISPs.  In 
partieular,  Figure  4.13(a)  eompares  the  RTT  performanee  metrie  of  optimal,  random  and  worst-ease 
ehoiee  of  multihoming  ISPs  for  a  subseriber  in  San  Franeiseo.  In  addition,  we  also  show  the  RTT 
performanee  metrie  for  the  ease  when  the  subseriber  multihomes  to  the  top  k  individual  ISPs  (in 
terms  of  average  RTT  performanee).  Not  only  does  seleeting  the  top  k  individual  ISPs  out  perform 
a  random  ehoiee,  it  also  provides  similar  RTT  performanee  as  the  optimal  ehoiee.  Nevertheless,  a 
more  informed  seleetion  of  ISPs  (than  simply  ehoosing  the  top  k)  eould  yield  up  to  5-10%  better 
RTT  performanee  on  average  (see,  for  example.  A:  =  3, 4  in  Figure  4.13(a)).  We  show  a  similar  set 
of  results  for  Los  Angeles  in  Figure  4.13(b).  In  this  ease,  ehoosing  the  top  k  ISPs  yields  identieal 
RTT  performanee  as  the  optimal  ehoiee  of  ISPs. 

In  summary,  we  fi  nd  that  fc-multihoming  ean  substantially  improve  RTT  and  throughput  per¬ 
formanee  of  subseriber  networks.  However,  there  is  little  additional  benefi  t  from  employing  more 
than  3  ISP  eonneetions.  Also,  the  3  ISPs  themselves  must  be  ehosen  earefully  to  realize  the  poten¬ 
tial  benefi  ts.  In  view  of  this,  a  good  heuristie  for  the  subseriber  network  is  to  seleet  the  top  three 
individual  ISPs  serving  its  eity. 

4.4  Resilience  to  Path  Failures 

End  networks  employing  a  single  ISP  eonneetion  must  use  the  paths  provided  by  their  ISPs  to 
maintain  eonneetivity  with  Internet  destinations.  As  a  result,  failures  inside  their  ISPs’  networks 
inevitably  result  in  eonneetivity  disruptions  at  the  end  networks.  Past  studies  (see,  for  example,  [39]) 
have  shown  that  sueh  failures  eould  last  up  to  several  minutes,  and  severely  impaet  end-user  Internet 
aeeess  experienee.  Sueh  end-networks  ean  vastly  improve  their  resilienee  from  serviee  interruptions 
by  relying  on  multihoming  route  eontrol.  By  monitoring  the  availability  of  paths  via  eaeh  of  their 
ISPs  and  eleverly  seheduling  traffi  e  on  ISPs  with  available  paths,  end-network  eould  substantially 
reduee  eonneetivity  outages.  In  this  seetion,  we  analyze  two  distinet  datasets  eolleeted  over  the 
testbed  deseribed  in  Seetion  4.3  (see  Figure  4.7)  to  quantify  the  improvements  in  resilienee  from 
using  multiple  ISP  eonneetions.  We  foeus  on  the  speeial  ease  where  the  subseriber  employs  three 
or  fewer  ISPs  for  multihoming. 

4.4.1  Active  Measurements  of  Path  Availability 

In  our  fi  rsf  approaeh,  we  perform  fwo-way  ICMP  pings  befween  fhe  68  nodes  in  our  lesfbed  (Fig¬ 
ure  4.7).  The  ping  samples  were  eolleeted  befween  all  node-pairs  over  a  fi  ve  day  period  from  January 
23rd,  2004  fo  January  28fh,  2004.  The  probes  are  senf  onee  every  minufe  wifh  a  one  seeond  fimeouf. 
If  no  response  is  reeeived  wifhin  a  seeond,  fhe  ping  is  deemed  losf.  A  pafh  is  eonsidered  fo  have 
failed  if  af  leasf  3  eonseeufive  pings  (eaeh  one  minute  aparf)  from  fhe  souree  fo  fhe  desfinafion  are 
losf.  From  fhese  measuremenfs  we  derive  “failure  epoehs”  on  eaeh  pafh.  The  epoeh  begins  when  fhe 
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Figure  4.14:  End-to-end  failures:  Distribution  of  the  availability  on  the  end-to-end  paths,  with  and  without  mul- 
tihoming.  The  ISPs  in  the  2-  and  3-multihoming  cases  are  the  best  2  and  3  ISPs  in  each  city  based  on  RTT 
performance,  respectively. 

third  failed  probe  times  out,  and  ends  on  the  fi  rst  sueeessful  reply  from  a  subsequent  probe.  These 
epoehs  are  the  periods  of  time  when  the  route  between  the  souree  and  destination  may  have  failed. 

This  method  of  deriving  failure  epoehs  has  a  few  limitations.  Firstly,  sinee  we  wait  for  three 
eonseeutive  losses,  we  eannot  deteet  failures  that  last  less  than  3  minutes.  As  a  result,  our  analysis 
does  not  eharaeterize  the  ability  of  multihoming  to  avoid  sueh  short  failures.  Seeondly,  ping  paekets 
may  also  be  dropped  due  to  eongestion  rather  than  path  failure.  Unfortunately,  from  our  measure¬ 
ments  we  cannot  easily  determine  if  the  losses  are  due  to  failures  or  due  to  congestion.  Finally,  the 
destination  may  not  reply  with  ICMP  echo  reply  messages  within  one  second,  causing  us  to  record 
a  loss.  To  mitigate  this  factor,  we  eliminate  paths  for  which  the  fraction  of  lost  probes  is  >  10% 
from  our  analysis.  Due  to  the  above  reasons,  the  path  failures  we  identify  should  be  considered  an 
over-estimate  of  the  number  of  failures  lasting  three  minutes  or  longer. 

From  the  failure  epochs  on  each  end-to-end  path,  we  compute  the  corresponding  availability, 
defi  ned  as  follows: 

Availability  =  100  x  Tl  —  ^ 


where,  Tp{i)  is  the  length  of  failure  epoch  i  along  the  path,  and  T  is  the  length  of  the  mea¬ 
surement  interval  (5  days).  The  total  sum  of  the  failure  epochs  can  be  considered  the  observed 
“downtime”  of  the  path. 

In  Figure  4. 14,  we  show  a  CDF  of  the  availability  on  the  paths  we  measured,  with  and  without 
multihoming.  When  no  multihoming  is  employed,  we  see  that  all  paths  have  at  least  91%  availability 
(not  shown  in  the  fi  gure).  Fewer  than  5%  of  all  paths  have  less  than  99.5%  availability.  Route  control 
with  multihoming  signifi  candy  improves  the  availability  on  the  end-to-end  paths,  as  shown  by  the 
2-  and  3-multihoming  availability  distributions.  For  both  2-  and  3 -multihoming,  we  consider  the 
combinations  of  ISPs  providing  the  best  RTT  performance  in  a  city.  Even  when  route  control  uses 
only  2  ISPs,  less  than  1%  of  the  paths  originating  from  the  cities  we  studied  have  an  availability 
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under  99.9%.  The  minimum  availability  aeross  all  the  paths  is  99.85%,  whieh  is  mueh  higher  than 
without  multihoming.  Also,  more  than  94%  of  the  paths  from  the  various  eities  to  the  respeetive 
destinations  do  not  experienee  any  observable  failures  during  the  5  day  period  (i.e., availability  of 
100%).  With  three  ISPs,  the  availability  is  improved,  though  slightly. 


4.4.2  Path  Availability  Analysis 


Sinee  the  vast  majority  of  paths  did  not  fail  even  onee  during  our  relatively  short  measurement  pe¬ 
riod,  our  seeond  approaeh  uses  statisties  derived  from  previous  long-term  measurements  to  aseertain 
availability.  Feamster  et  al.  eolleeted  failure  data  using  aetive  probes  between  nodes  in  the  RON 
testbed  approximately  every  30  seeonds  for  several  months  [39].  When  three  eonseeutive  probes  on 
a  path  were  lost,  a  traeeroute  was  triggered  to  identify  where  the  failure  appeared  (i.e.,  the  last  router 
reaehable  by  the  traeeroute)  and  how  long  they  lasted.  The  routers  in  the  traeeroute  data  were  also 
labeled  with  their  eorresponding  AS  number  and  also  elassifi  ed  as  border  or  internal  routers.  We  use 
a  subset  of  these  measurements  on  paths  between  non-DSL  nodes  within  the  U.S.  eolleeted  between 
June  26,  2002  and  Mareh  12,  2003  to  infer  failure  rates  in  our  testbed.  Though  this  approaeh  has 
some  drawbaeks  (whieh  we  diseuss  later),  it  allows  us  to  obtain  a  view  of  longer-term  availability 
benefi  ts  of  route  eontrol  that  is  not  otherwise  possible  from  direet  measurements  on  our  testbed. 

We  first  estimate  the  availabilities  of  different  router  elasses  (i.e.,  the  fraetion  of  time  they  are 
able  to  eorreetly  forward  paekets).  We  elassify  routers  in  the  RON  testbed  traeeroutes  by  their  AS 
tier  (using  the  method  in  [108])  and  their  role  (border  or  internal  router).  Note  that  the  inferenee 
of  failure  loeation  is  based  on  router  loeation,  but  the  aetual  failure  eould  be  at  the  link  or  router 
attaehed  to  the  last  responding  router. 

The  availability  estimate  is  eomputed  as  follows:  If  Tp  is  the  total  time  failures  attributed  to 
routers  of  elass  C  were  observed,  and  is  the  total  number  of  routers  of  elass  C  we  observed  on 
eaeh  path  on  day  d,^  then  we  estimate  the  availability  of  a  router  (or  attaehed  link)  of  class  C  as: 


Availabilityc 


100  X  1  - 


^  one.day^ 


In  other  words,  the  fraction  of  time  unavailable  is  the  aggregate  failure  time  attributed  to  a  router  of 
class  C  divided  by  the  total  time  we  expect  to  observe  a  router  of  class  C  in  any  path.  Our  estimates 
for  various  router  classes  are  shown  in  Table  4.2. 

To  apply  the  availability  statistics  derived  from  the  RON  data  set,  we  identifi  ed  and  elassifi  ed 
the  routers  on  paths  between  nodes  in  our  testbed.  We  performed  traeeroute  measurements  approx¬ 
imately  every  20  minutes  between  nodes  in  our  CDN  testbed  from  December  4,  2003  to  Dec  11, 
2003.  For  our  analysis  we  used  the  most  often  observed  path  between  each  pair  of  nodes;  in  almost 

^The  dataset  only  included  a  single  successful  traeeroute  per  day.  Therefore,  we  assumed  that  all  active  probes  took 
the  same  route  each  day. 
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AS  Tier 

Location 

Availability  (%) 

1 

internal 

99.940 

1 

border 

99.985 

2 

internal 

99.995 

2 

border 

99.977 

3 

internal 

99.999 

3 

border 

99.991 

4 

internal 

99.946 

4 

border 

99.994 

5 

internal 

99.902 

5 

border 

99.918 

Table  4.2:  Availability  across  router  classes:  Estimated  availability  for  routers  or  links  classifl  ed  by  AS  tier  and 
location.  We  consider  a  border  router  as  one  with  at  least  one  link  to  another  AS. 


all  cases,  this  path  was  used  more  than  95%  of  the  time.  Using  the  router  availabilities  estimated 
from  the  RON  data  set,  we  estimate  the  availability  of  routes  in  our  testbed  when  we  use  multi¬ 
homing.  When  estimating  the  simultaneous  failure  probability  of  multiple  paths,  it  is  important 
to  identify  which  routers  are  shared  among  the  paths  so  that  failures  on  those  paths  are  accurately 
correlated.  Because  determining  router  aliases  was  diffi  cult  on  some  paths  in  our  testbed,^  we  con¬ 
servatively  assumed  that  the  routers  at  the  end  of  paths  toward  the  same  destination  were  identical  if 
they  belonged  to  the  same  sequence  of  ASes.  For  example,  if  we  had  two  router-level  paths  destined 
for  a  common  node  that  map  to  the  ASes  A  A  B  B  C  C  and  D  D  D  B  C  C,  respectively,  we 
assume  the  last  3  routers  are  the  same  (since  B  C  C  is  common).  Even  if  in  reality  these  routers 
are  different,  failures  at  these  routers  are  still  likely  to  be  correlated.  The  same  heuristic  was  used  to 
identify  identical  routers  on  paths  originating  from  the  same  source  node.  We  assume  other  failures 
are  independent. 

A  few  aspects  of  this  approach  may  introduce  biases  in  our  analysis.  First,  the  routes  on  RON 
testbed  paths  may  not  be  representative  of  the  routes  in  our  testbed,  though  we  tried  to  ensure  sim¬ 
ilarity  by  using  only  using  paths  between  relatively  well-connected  RON  nodes  in  the  U.S.  (a  good 
fraction  of  RON  nodes  belong  to  home  DSL  users).  In  addition,  we  observed  that  the  availabilities 
across  router  classes  in  the  RON  dataset  did  not  vary  substantially  across  different  months,  so  we  do 
not  believe  the  difference  in  time  frames  impacted  our  results.  Second,  there  may  be  routers  or  links 
in  the  RON  data  set  that  fail  frequently  and  bias  the  availability  of  a  particular  router  type.  However, 
since  traceroutes  are  initiated  only  when  a  failure  is  detected,  there  is  no  way  for  us  to  accurately  es¬ 
timate  the  overall  failure  rates  of  all  individual  routers.  Third,  it  is  questionable  whether  we  should 
assign  failures  to  the  last  reachable  router  in  a  traceroute;  it  is  possible  that  the  next  (unknown)  or 

^We  found  that  several  ISPs  block  responses  to  UDP  probe  packets  used  by  IP  alias  resolution  tools  such  as  Ally  [105] 
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City 


Figure  4.15:  Availability  comparison:  Comparison  of  availability  averaged  across  paths  originating  from  six  cities 
using  a  single  ISP  and  using  3-multihoming.  ISPs  are  chosen  based  on  their  round-trip  time  performance. 

an  even  further  router  in  the  path  is  actually  the  one  that  failed.  Nevertheless,  our  availahilities  still 
estimate  how  often  failures  are  observed  at  or  just  after  a  router  of  a  given  type. 

Figure  4.15  compares  the  average  availability  using  multihoming  route  control  on  paths  origi¬ 
nating  from  6  cities  to  all  destinations  in  our  testbed.  As  expected  from  our  active  measurements, 
the  average  availability  along  the  paths  in  our  testbed  are  relatively  high,  even  for  direct  paths. 
3-multihoming  improves  the  average  availability  by  0.15-0.24%  in  all  the  cities  (corresponding  to 
about  13-21  more  hours  of  availability  each  year).  Here,  the  availability  is  primarily  upper  bounded 
by  the  availability  of  the  routers  or  links  immediately  before  the  destination  that  are  shared  by  all 
three  paths  as  they  converge. 

In  summary,  route  control  can  improve  the  availability  of  end-networks,  in  addition  to  perfor¬ 
mance.  While  not  all  failures  can  be  fully  eliminated,  the  availability  offered  by  route  control  is, 
nevertheless,  very  good  for  all  practical  purposes. 

4.5  Summary  of  Observations  and  their  Implications 

In  this  chapter  we  studied  the  potential  improvements  in  Internet  performance  and  reliability  from 
multihoming.  The  key  observations  from  this  study  are  outlined  in  Table  4.3.  Our  study  estab¬ 
lishes  that  multihoming  route  control  could  significantly  improve  the  Internet  RTTs,  throughputs 
and  reliability  of  end  networks.  Furthermore,  beyond  3  ISP  connections  the  marginal  benefi  t  from 
employing  additional  ISPs  is  minimal.  Also,  it  is  important  to  make  an  informed  choice  of  ISPs  to 
realize  the  potential  benefi  fs  of  mulfihoming.  A  good  sfrafegy  is  for  fhe  end  nefwork  fo  mulfihome 
fo  fhe  fop  2  or  3  individual  ISPs  serving  ifs  cify. 

Also,  in  our  evaluation  of  mulfihoming  roufe  confrol,  we  assumed  fhaf  nefwork  desfinafions 
were  all  singly-homed.  Despite  fhis  resfricfion,  we  observed  signifi  can!  performance  improvemenfs. 
In  some  cases,  however,  fhe  desfinafion  nefwork  ifself  could  be  multihomed.  Furfhermore,  fhe  des¬ 
tination  and  fhe  mulfihomed  source  nefwork  could  be  adminisfrafively  relafed,  for  example,  branch 
offi  ces  of  fhe  same  parenf  organizafion.  In  such  cases,  fhe  performance  experienced  by  eifher  end 
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Multihoming  route  control  can  lower  Internet  RTTs  by  about  25%  or  more  relative  to 
the  best  single  ISP.  In  addition,  multihoming  could  yield  up  to  20%  higher  transfer 
speeds. 


Our  measurements  show  that  substantial  fractions  of  transfers  from  nine  major  U.S. 
cities  to  various  Internet  destinations  could  experience  at  least  an  25ms  improvement 
in  RTTs  and  an  8Mbps  improvement  in  transfer  speeds  from  route  control. 


In  terms  of  reliability,  we  observe  that  multihoming  to  2  or  3  ISPs  eliminates  most 
failures  experienced  by  an  singly-homed  network. 

We  see  a  strong  evidence  of  diminishing  returns.  The  improvements  in  Internet  perfor¬ 
mance  are  marginal  beyond  3  ISP  connections. 

A  careful  choice  of  ISPs  is  important  to  realize  the  potential  benefi  ts  of  multihoming 
route  control.  A  poor,  uninformed  choice  of  ISPs,  for  example,  could  yield  RTTs  that 
are  more  than  double  the  RTTs  from  the  optimal  choice. 


In  most  of  the  cities  we  study,  employing  the  top  3  individual  ISPs  for  multihoming 
provides  roughly  the  same  performance  improvements  as  employing  the  best  set  of  3 
ISPs. 

Table  4.3:  Benefi  ts  of  Multihoming  Route  Control:  Summary  of  key  observations  regarding  multihoming. 

when  communicating  with  the  order  could  be  heavily  optimized  by:  (1)  co-ordinating  the  choices 
of  the  ISPs  that  the  two  ends  connect  to.  For  example,  if  both  ends  connected  to  the  same  set  of  two 
tier-1  ISP,  then  all  network  paths  between  the  two  would  be  at  most  two  AS  hops  long;  And,  (2)  co¬ 
ordinating  the  choice  of  routes.  For  example,  if  both  ends  decided  to  use  the  same  ISP  connection 
for  a  given  data  transfer,  the  transfer  would  traverse  a  single  ISP  domain,  avoiding  all  inter-domain 
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policies  and  taking  the  shortest-IP-hop  route. 

Notice  that  in  Internet  routing,  each  ISP  provides  a  subscriber  exactly  one  BGP  path  per  destina¬ 
tion.  By  employing  multihoming,  and  choosing  a  good  set  of  ISPs,  an  end-network  enjoys  a  slightly 
richer  selection  of  BGP  routes  per  destination  (i.e.,  one  route  per  ISP  for  every  destination),  that  it 
can  choose  from  in  an  informed  manner.  In  effect,  therefore,  the  observations  in  this  chapter  have 
shown  that  when  the  routing  flexibility  of  end-networks  is  improved  by  moderate  amounts  using 
multihoming  route  control,  their  Internet  performance  and  reliability  can  be  signifi  cantly  improved. 

The  contributions  of  our  measurement  study  of  multihoming  can  be  simply  summarized  as  fol¬ 
lows:  By  improving  the  routing  flexibility  of  endpoints  by  moderate  amounts,  their  Internet  perfor¬ 
mance  and  reliability  can  be  vastly  improved.  In  the  next  chapter,  we  ask  whether  this  flexibility  is 
suffi  cient,  or  whether  enabling  much  better  routing  flexibility  at  end-networks  (e.g.,  using  Overlay 
networks)  could  result  in  superior  performance  and  reliability. 
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Chapter  5 


A  Comparison  of  Overlay  Routing  and  Multihoming  Route  Control 


The  typical  model  for  enterprise  Internet  access  today  is  for  the  enterprise  to  buy  Internet  connectiv¬ 
ity  from  an  ISP  and  route  its  traffi  c  via  the  ISP.  In  this  model,  the  ISP  determines  how  the  enterprise’s 
data  should  be  routed  across  the  Internet.  Often,  when  the  destination  involved  is  connected  to  a  dif¬ 
ferent  ISP,  the  enterprise’s  own  ISP  employs  Border  Gateway  Protocol  (BGP)  to  route  traffi  c  toward 
the  destination.  However,  several  past  analyses  of  BGP  have  highlighted  serious  ineffi  ciencies  in 
its  functioning.  For  example,  the  routes  enabled  by  BGP  often  yield  sub-optimal  RTTs  and  transfer 
speeds.  Moreover,  failures  in  BGP,  resulting  from  events  such  as  router  malfunction,  often  require 
unacceptably  long  reconvergence  and  stabilization  times.  Quite  aptly,  then,  these  limitations  of  con¬ 
ventional  Internet  routing  based  on  the  Border  Gateway  Protocol  (BGP)  are  often  held  responsible 
for  failures  and  poor  performance  of  end-to-end  Internet  transfers. 

A  number  of  studies  have  shown  that  the  underlying  connectivity  of  the  Internet  is  actually 
capable  of  providing  much  greater  performance  and  resilience  than  endpoints  currently  receive. 
These  studies  (see,  for  example.  Detour  [99,  100]  and  RON  [14])  demonstrated  that  using  overlay 
routing  to  bypass  BGP’s  policy-driven  routing  enables  quicker  reaction  to  failures  and  improved 
end-to-end  performance.  In  this  approach,  endpoints  can  route  their  traffi  c  via  intermediate  overlay 
nodes  deployed  around  the  Internet.  This  helps  endpoints  bypass  the  default  routes  determined  by 
BGP  and  avoid  performance  and  availability  problems  along  these  routes. 

In  this  chapter,  we  question  whether  overlay  routing  is  required  to  make  the  most  of  the  under¬ 
lying  connectivity,  or  whether  a  more  intelligent  selection  of  BGP  routes  at  an  endpoint  is  suffi  cient. 
Specifi  cally,  in  this  chapter,  we  compare  and  contrast  overlay  routing  against  multihoming  route 
control.  As  noted  in  the  previous  chapter,  multihoming  route  control  enables  end-networks  to  in¬ 
telligently  control  and  use  BGP  paths  provided  by  their  multiple  ISPs.  Furthermore,  multihoming 
does  not  require  any  changes  or  improvements  to  the  underlying  BGP  protocol.  Our  goal,  then,  is 
to  answer  the  following  question: 

How  much  benefi  t,  in  terms  of  Internet  performance  or  resilience,  does  overlay  routing 
provide  over  multihoming  route  control? 

If  the  benefi  t  is  small,  then  BGP  path  selection  (based  on  multihoming)  is  not  as  inferior  as  it 
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is  held  to  be,  and  good  end-to-end  performanee  and  reliability  are  aehievable  even  when  operating 
eompletely  within  standard  Internet  routing.  On  the  other  hand,  if  overlays  yield  signifi  eantly  better 
performanee  and  reliability  eharaeteristies,  we  ean  eonelude  that  BGP  is  fundamentally  limited.  In 
sueh  a  situation,  it  is  important  to  develop  alternate  bypass  arehiteetures  sueh  as  overlay  routing  or 
invent  new  wide-area  routing  protoeols. 

Using  extensive  aetive  downloads  and  traeeroutes  between  68  Akamai  CDN  servers  in  the 
testbed  deseribed  earlier  in  Seetion  4.3,  we  eompare  multihoming  route  eontrol  and  overlay  routing 
in  terms  of  three  key  metrics:  round-trip  delay,  throughput,  and  availability.  Our  measurement  re¬ 
sults  suggest  that  when  route  control  is  employed  along  with  multihoming,  it  can  offer  performance 
similar  to  overlays  in  terms  of  round-trip  delay  and  throughput.  On  average,  the  round-trip  times 
achieved  by  the  best  BGP  paths  (selected  by  an  ideal  route  control  mechanism  using  3  ISPs)  are 
within  5-15%  of  the  best  overlay  paths  (selected  by  an  ideal  overlay  routing  scheme  also  multi¬ 
homed  to  3  ISPs).  Similarly,  the  throughput  on  the  best  overlay  paths  is  only  1-10%  better  than 
the  best  BGP  paths.  We  also  show  that  the  marginal  difference  in  the  RTT  performance  can  be 
attributed  mainly  to  overlay  routing’s  ability  to  select  shorter  paths,  and  that  this  difference  can  be 
reduced  further  if  ISPs  implement  cooperative  peering  policies.  Our  comparison  of  the  end-to-end 
path  availability  provided  by  either  approach  shows  that  multihoming  route  control,  like  overlay 
routing,  is  able  to  signifi  eantly  improve  the  availability  of  end-to-end  paths. 

Chapter  outline.  In  Section  5.1  of  this  chapter,  we  provide  an  overview  of  our  approach  to  com¬ 
paring  overlay  routing  and  route  control.  In  Section  5.2,  we  analyze  the  RTT  and  throughput  perfor¬ 
mance  differences  between  route  control  and  overlay  routing  and  consider  some  possible  reasons  for 
the  differences.  In  Section  5.3,  we  contrast  the  end-to-end  availability  offered  by  the  two  schemes. 
Section  5.4  summarizes  the  observations  in  this  chapter,  discusses  the  implications  of  the  results, 
and  presents  some  limitations  of  our  study. 

5.1  Terminology 

Our  objective  is  to  understand  whether  the  modest  flexibility  of  multihoming,  coupled  with  route 
control,  is  able  to  offer  end-to-end  performance  and  resilience  similar  to  overlay  routing.  In  order 
to  answer  this  question,  we  evaluate  an  idealized  form  of  multihoming  route  control  driven  by 
the  three  key  assumptions  outlined  in  Chapter  4:  perfect  information  of  ISP  performance,  low- 
overhead  in  shifting  traffi  c  across  ISPs  and  mechanisms  for  inbound  route  control.  To  ensure  a  fair 
comparison,  we  study  a  similarly  agile  form  of  overlay  routing  where  the  endpoint  has  timely  and 
accurate  knowledge  of  the  best  performing,  or  most  available,  end-to-end  overlay  paths.  Frequent 
active  probing  of  each  overlay  link,  makes  it  possible  to  select  and  switch  to  the  best  overlay  path  at 
almost  any  instant  when  the  size  of  the  overlay  network  is  small  (~50  nodes)  ^ . 

*Such  frequent  probing  is  infeasible  for  larger  overlays  [14]. 
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We  compare  overlay  routing  and  route  control  with  respect  to  the  degree  of  flexibility  in  paths 
available  at  the  end-network.  In  general,  this  flexibility  is  represented  by  k,  the  number  of  ISPs 
available  to  either  technique  at  the  end-network.  For  route  control,  we  consider  the  notion  of  k- 
multihoming  from  Chapter  4,  where  we  evaluate  the  performance  and  reliability  of  end-to-end  can¬ 
didate  paths  induced  by  a  combination  of  k  ISPs.  For  overlay  routing,  we  introduce  a  similar  notion 
of  k-overlays,  where  k  is  the  number  of  ISPs  available  to  an  endpoint  for  any  end-to-end  overlay 
path.  In  other  words,  this  is  simply  overlay  routing  in  the  presence  of  k  ISP  connections.  When 
comparing  A: -multihoming  with  A:-overlays,  we  report  results  based  on  the  combination  of  k  ISPs 
that  gives  the  best  performance  (RTT  or  throughput)  across  all  destinations. 

Figure  5.1  illustrates  some  possible  route  control  and  overlay  configurations  to  clarify  the  ter¬ 
minology  used  in  this  chapter.  For  example,  (a)  shows  the  case  of  conventional  BGP  routing  with 
a  single  default  ISP  (i.e.,  1 -multihoming).  Figure  5.1(b)  depicts  endpoint  route  control  with  three 
ISPs  (i.e.,  3-multihoming).  Overlay  routing  with  a  single  first-hop  ISP  (i.e.,  1-overlay)  is  shown 
in  Figure  5.1(c),  and  Figure  5.1(d)  shows  the  case  of  additional  first-hop  flexibility  in  a  3-overlay 
routing  confi  guration. 

When  comparing  overlay  routing  and  route  control,  we  seek  to  answer  the  following  key  ques¬ 
tions: 

1.  On  what  fraction  of  end-to-end  paths  does  overlay  routing  outperform  multihoming  route 
control  in  terms  of  RTT  and  throughput?  In  these  cases,  what  is  the  extent  of  the  performance 
difference? 

2.  What  are  the  reasons  for  the  performance  differences?  For  example,  must  overlay  paths 
violate  inter-domain  routing  policies  to  achieve  good  end-to-end  performance? 

3.  Does  route  control  achieve  path  availability  rates  that  are  comparable  with  overlay  routing? 

5.2  Latency  and  Throughput  Performance 

We  now  present  our  results  on  the  latency  and  throughput  performance  benefi  ts  of  route  control  com¬ 
pared  with  overlay  routing.  We  fi  rst  provide  details  of  our  RTT  and  throughput  comparison  tech¬ 
niques.  Then,  we  present  the  key  results  in  the  following  order.  First  we  compare  1 -multihoming 
against  1 -overlays  (this  is  similar  to  the  analysis  in  [99]).  Next,  we  compare  the  benefits  of  us¬ 
ing  A:-overlay  routing,  relative  to  using  default  paths  through  a  single  ISR  Then,  we  compare  k- 
multihoming  against  1 -overlay  routing,  for  A:  >  1.  Here,  we  wish  to  quantify  the  benefit  to  end- 
systems  of  greater  flexibility  in  the  choice  of  BGP  routes  via  multihoming,  relative  to  the  power  of 
1-overlays.  Next,  we  contrast  A:-multihoming  against  A:-overlay  routing  to  understand  the  additional 
benefi  ts  gained  by  allowing  end-systems  almost  arbitrary  control  on  end-to-end  paths,  relative  to 
multihoming.  Finally,  we  examine  some  of  the  underlying  reasons  for  the  performance  differences. 
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(a)  single  ISP,  BGP  routing 


(b)  multihoming  with  3  ISPs 


(I-multihoming) 


(c)  single  ISP,  overlay  routing 
(1 -overlay) 


(3-muhihoming) 


Figure  5.1:  Routing  configurations:  Figures  (a)  and  (b)  show  l-mnltihoming  and  3-niultihoniing,  respectively. 
Corresponding  overlay  confi  gurations  are  shown  in  (c)  and  (d),  respectively. 


5.2.1  Comparing  RTTs  and  Throughputs 

Our  comparison  of  overlays  and  multihoming  is  based  on  observations  drawn  from  the  RTT  and 
throughput  datasets  described  earlier  in  Section  4.3. 

RTT  Comparison.  In  the  RTT  data  set,  for  each  6  minute  measurement  interval,  we  build  a 
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weighted  graph  over  the  68  testbed  nodes  where  the  edge  weights  are  the  RTTs  measured  between 
the  eorresponding  node-pairs.  We  then  use  Floyd’s  algorithm  [31]  to  eompute  the  shortest  paths 
between  all  node-pairs.  We  estimate  the  RTT  performanee  from  using  fc-multihoming  to  a  given 
destination  by  eomputing  the  minimum  of  the  RTT  estimates  along  the  direet  paths  from  the  k  ISPs 
in  a  eity  to  the  destination  node  (i.e.,  the  RTT  measurements  between  the  Akamai  CDN  nodes  rep¬ 
resenting  the  k  ISPs  and  the  destination  node).  To  estimate  the  performanee  of  A:-overlay  routing, 
we  eompute  the  shortest  paths  from  the  k  ISPs  to  the  destination  node  and  ehoose  the  minimum  of 
the  RTTs  of  these  paths. 

Note  that  we  do  not  prune  the  direet  overlay  edge  in  the  graph  before  performing  the  shortest 
path  eomputation.  As  a  result,  the  shortest  overlay  path  between  two  nodes  eould  be  a  direct  path 
(i.e.,  ehosen  by  BGP).  Henee  our  eomparison  is  not  limited  to  direet  versus  indireet  paths,  but 
is  rather  between  direet  and  overlay  paths.  In  eontrast,  the  eomparison  in  past  studies  (see,  for 
example,  [99])  was  between  the  direet  path  and  the  best  indirect  path. 

Throughput  Comparison.  For  throughput,  we  similarly  eonstruet  a  weighted,  direeted  graph  be¬ 
tween  the  testbed  nodes  every  30  minutes  (i.e.,  our  1  MB  objeet  download  frequeney).  The  edge 
weights  are  the  throughputs  of  the  1  MB  transfers  (where  throughput  is  simply  the  transfer  size 
divided  by  the  eompletion  time).  We  eompute  the  throughput  performanee  of  fc-multihoming  and 
A:-overlay  routing  similar  to  the  RTT  performanee  eomputation  above.  Notiee,  however,  that  eom¬ 
puting  the  overlay  throughput  performanee  is  non-trivial  and  is  eomplieated  by  the  problem  of 
estimating  the  end-to-end  throughput  for  a  1  MB  TCP  transfer  on  indireet  overlay  paths. 

Our  approaeh  here  is  to  use  round-trip  time  and  throughput  measurements  on  individual  overlay 
hops  to  fi  rst  eompute  the  underlying  loss  rates,  using  standard  model  for  TCP  throughput.  Sinee 
it  is  likely  that  the  paths  we  measure  do  not  observe  any  loss,  thus  eausing  the  transfers  to  likely 
remain  in  their  slow-start  phases,  we  use  the  small  eonneetion  lateney  model  developed  in  [29]^. 
The  original  model  for  TCP  throughput  [83]  does  not  aeeurately  eharaeterize  short  flows  or  lossless 
transfers.  Using  this  model,  in  our  throughput  data  set,  we  measure  a  mean  loss  rate  of  1.2%  and 
median,  90th,  95th  and  99th  pereentile  loss  rates  of  0.004%,  0.5%,  1%  and  40%  aeross  all  paths 
measured,  respeetively. 

We  ean  then  use  the  sum  of  round-trip  times  and  a  eombination  of  loss  rates  on  the  individual 
hops  as  the  end-to-end  round-trip  time  and  loss  rate  estimates,  respeetively,  and  employ  the  model 
in  [29]  to  eompute  the  end-to-end  overlay  throughput  for  the  1  MB  transfers.  To  eombine  loss  rates 
on  individual  links,  we  follow  the  same  approaeh  as  that  deseribed  in  [99].  We  eonsider  two  possible 
“eombination”  or  estimation  funetions: 

1.  The  optimistic  throughput  estimate  uses  the  maximum  observed  loss  on  any  individual  overlay 
hop  along  an  overlay  path  as  an  estimate  of  the  end-to-end  overlay  loss  rate.  This  assumes 

^The  typical  maximum  segment  size  (MSS)  in  our  1MB  transfers  is  1460  bytes.  Also,  the  initial  congestion  window 
size  is  2  segments  and  there  is  no  initial  200ms  delayed  ACK  timeout  on  the  first  transfer. 
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City 

1-multihoming/ 

1-overlay 

Atlanta 

1.35 

Bay  Area 

1.20 

Boston 

1.28 

Chieago 

1.29 

Dallas 

1.32 

Los  Angeles 

1.22 

New  York 

1.29 

Seattle 

1.34 

Wash  D.C. 

1.30 

Average 

1.29 

(a)  1 -multihoming  RTT  (b)  1 -overlay  path  length 

relative  to  1 -overlays 


Figure  5.2:  Round-trip  time  performance:  Average  RTT  performance  of  1-multihoming  relative  to  1-overlay 
routing  is  tabulated  in  (a)  for  various  cities.  The  graph  in  (b)  shows  the  distribution  of  the  number  of  overlay  hops 
in  the  best  1-overlay  paths,  which  could  be  the  direct  path  (i.e.,  1  overlay  hop). 


that  the  TCP  sender  is  primarily  responsible  for  the  observed  losses. 

2.  In  the  pessimistic  eombination,  we  eompute  the  end-to-end  loss  rate  as  the  sum  of  individual 
overlay  hop  loss  rates,  assuming  the  losses  on  eaeh  link  to  be  due  to  independent  baekground 
traffi  e  in  the  network^. 

Due  to  the  eomplexity  of  eomputing  arbitrary  length  throughput-maximizing  overlay  paths,  we 
only  eonsider  indireet  paths  eomprised  of  at  most  two  overlay  hops  in  our  throughput  eomparison. 

5.2.2  1 -Multihoming  versus  1-Overlays 

First,  we  eompare  the  performanee  of  overlay  routing  against  default  routes  via  a  single  ISP  (i.e., 
1-overlay  against  1 -multihoming).  Our  goal  is  to  eonfirm  the  observations  in  past  overlay  routing 
studies  (sueh  as  [99]).  Note  that,  in  the  ease  of  1-overlays,  the  overlay  path  from  a  souree  node  may 
traverse  through  any  intermediate  node,  ineluding  nodes  loeated  in  the  same  eity  as  the  souree. 
Round-trip  time  performance.  Figure  5.2(a)  shows  the  RTT  performanee  of  1-multihoming  rel¬ 
ative  to  1 -overlay  routing.  Here,  the  performanee  metrie  (y-axis)  refleets  the  relative  RTT  from 

^The  end-to-end  loss  rate  over  two  overlay  links  with  independent  loss  rates  of  pi  and  p2  is  1  —  (1  —  pi)(l  —  P2)  = 
Pi  -f  P2  —  PiP2-  P1P2  is  negligible  in  our  measurements,  so  we  ignore  it. 
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City 

Pessimistic  estimate 

Optimistic  estimate 

Throughput  metric 

Fraction  of 
indirect  paths 

Throughput  metric 

Fraction  of 
indirect  paths 

Atlanta 

1.14 

17% 

1.17 

21% 

Bay  Area 

1.06 

11% 

1.10 

22% 

Boston 

1.19 

22% 

1.24 

26% 

Chieago 

1.12 

13% 

1.15 

18% 

Dallas 

1.16 

18% 

1.18 

22% 

Los  Angeles 

1.18 

15% 

1.21 

17% 

New  York 

1.20 

14% 

1.25 

26% 

Seattle 

1.18 

28% 

1.25 

35% 

Wash  D.C. 

1.09 

13% 

1.13 

18% 

Average 

1.15 

17% 

1.19 

23% 

Table  5.1:  Throughput  performance:  This  table  shows  the  1  MB  TCP  transfer  performance  of  1-overlay  routing 
relative  to  1-multihoming  (for  both  estimation  functions).  Also  shown  is  the  fraction  of  measurements  in  which 
1-overlay  routing  selects  an  Indirect  path  in  each  city. 

1 -multihoming  versus  the  RTT  when  using  1 -overlays,  averaged  over  all  samples  to  all  destinations. 
The  differenee  between  this  metrie  and  1  represents  the  relative  advantage  of  1 -overlay  routing  over 
l-multihoming.  Notiee  also  that  sinee  the  best  overlay  path  eould  be  the  direet  BGP  path,  the  per- 
formanee  from  overlays  is  at  least  as  good  as  that  from  the  direet  BGP  path.  We  see  from  the  table 
that  overlay  routing  ean  improve  RTTs  between  20%  and  35%  eompared  to  using  direet  BGP  routes 
over  a  single  ISP.  The  average  improvement  is  about  29%.  The  observations  in  [99]  are  similar. 

We  show  the  distribution  of  overlay  path  lengths  in  Figure  5.2(b),  where  the  direet  (BGP)  path 
eorresponds  to  a  single  overlay  hop.  Notiee  that  in  most  eities,  the  best  overlay  path  is  only  one 
or  two  hops  in  more  than  90%  of  the  measurements.  That  is,  the  majority  of  the  RTT  performanee 
gains  in  overlay  networks  are  realized  without  requiring  more  than  a  single  intermediate  hop.  Also, 
on  an  average,  the  best  path  from  1-overlays  eoineides  with  the  direet  BGP  path  in  about  55%  of 
the  measurements  (average  y-axis  value  at  x=l  aeross  all  eities). 

Throughput  performance.  In  Table  5.1,  we  show  the  throughput  performanee  of  1 -overlays  rel¬ 
ative  to  l-multihoming  for  both  the  pessimistie  and  the  optimistie  estimates.  1 -overlays  aehieve 
6-20%  higher  throughput  than  l-multihoming,  aeeording  to  the  pessimistie  estimate.  Aeeording  to 
the  optimistie  throughput  estimate,  1-overlays  aehieve  10-25%  better  throughput.  In  Table  5.1,  we 
also  show  the  fraetion  of  times  an  indireet  overlay  path  obtains  better  throughput  than  the  direet 
path,  for  either  throughput  estimation  funetion.  Under  the  pessimistie  throughput  estimate,  on  av¬ 
erage,  1-overlay  routing  benefi  ts  from  employing  an  indireet  path  in  about  17%  of  the  eases.  Under 
the  optimistie  estimate,  this  fraetion  is  23%. 
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Figure  5.3:  Benefl  ts  of  /c-overlays:  The  RTT  of  1 -multihoming  relative  to  /c-overlays  is  shown  in  (a)  and  throughput 
(pessimistic  estimate)  of  /c-overlays  relative  to  1-multihoming  is  shown  in  (h). 


To  summarize,  1 -overlays  offer  signifi  cantly  better  round-trip  time  performance  than  1 -multihoming. 
The  throughput  benefi  ts  are  lower,  but  still  signifi  cant.  Also,  in  a  large  fraction  of  the  measurements, 
indirect  1-overlay  paths  offer  better  RTT  performance  than  direct  1-multihoming  paths. 

5.2.3  1 -Multihoming  versus  A:-Overlays 

In  this  section  we  compare  the  flexibility  offered  by  multihoming  route  control  in  combination  with 
overlay  routing  against  using  default  routes  via  a  single  ISP  (i.e.,  A:-overlays  against  1 -multihoming). 

In  Figure  5.3(a),  we  show  the  RTT  performance  of  l-multihoming  relative  to  A:-overlays  as  a 
function  of  k.  Notice  that  fc-overlay  routing  achieves  25-50%  better  RTT  performance  than  1- 
multihoming,  for  k  —  3.  Notice  also,  that  the  RTT  performance  from  fc-overlay  routing  {k  >  3)  is 
about  5-20%  better  than  that  from  1-overlay  routing.  Figure  5.3(b)  similarly  compares  the  through¬ 
put  performance  of  fc-overlays  relative  to  l-multihoming,  for  the  pessimistic  estimate.  Again,  3- 
overlay  routing,  for  example,  is  20-55%  better  than  l-multihoming  and  about  10-25%  better  than 
1 -overlay  routing.  The  benefi  t  beyond  A:  =  3  is  marginal  across  most  cities,  for  both  RTT  as  well  as 
throughput. 

In  summary,  both  A:-muhihoming  and  A:-overlay  routing  offer  much  better  performance  than  1- 
multihoming,  in  terms  of  both  RTT  and  throughput.  In  addition,  fc-overlay  routing  {k  >  3)  achieves 
signifi  cantly  better  performance  compared  to  1 -overlay  routing. 

5.2.4  A: -Multihoming  versus  1-Overlays 

In  the  next  two  sections,  we  provide  a  head-to-head  comparison  of  multihoming  and  overlay  routing. 
First,  in  this  section,  we  allow  endpoints  the  flexibility  of  multihoming  route  control  and  compare 
the  resulting  performance  against  1 -overlays. 

In  Figure  5.4,  we  plot  the  performance  of  A-multihoming  relative  to  1-overlay  routing.  Here, 
we  compute  the  average  ratio  of  the  best  RTT  or  throughput  to  a  particular  destination,  as  achieved 
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Figure  5.4:  Multihoming  versus  1-overlays:  The  RTT  of  /c-multihoming  relative  to  1-overlays  is  shown  in  (a)  and 
throughput  (pessimistic)  of  1-overlays  relative  to  /c-multihoming  in  (h). 


by  either  technique.  The  average  is  taken  over  paths  from  each  city  to  destinations  in  other  cities, 
and  over  time  instants  for  which  we  have  a  valid  measurement  over  all  ISPs  in  the  city.^  We  also 
note  that  in  all  but  three  cities,  the  best  3-multihoming  ISPs  according  to  RTT  were  the  same  as  the 
best  3  according  to  throughput;  in  the  three  cities  where  this  did  not  hold,  the  third  and  fourth  best 
ISPs  were  simply  switched  and  the  difference  in  throughput  performance  between  them  was  less 
than  3%. 

The  comparison  according  to  RTT  is  shown  in  Figure  5.4(a).  The  performance  advantage  of 
1-overlays  is  less  than  5%  for  A:  =  3  in  nearly  all  cities.  In  fact,  in  some  cities,  e.g..  Bay  Area 
and  Chicago,  3-multihoming  is  marginally  better  than  overlay  routing.  As  the  number  of  ISPs  is 
increased,  multihoming  is  able  to  provide  shorter  round-trip  times  than  overlays.  Figure  5.4(b) 
shows  relative  benefi  ts  according  to  the  pessimistic  throughput  estimate.  Here,  multihoming  for 
A:  >  3  actually  provides  2-12%  better  throughput  than  1-overlays  across  all  cities.  The  results  are 
similar  for  the  optimistic  computation  and  are  omitted  for  brevity. 

In  summary,  the  performance  advantages  of  1 -overlays  are  vastly  reduced  (or  eliminated)  when 
the  endpoint  is  allowed  greater  flexibility  in  the  choice  of  BGP  paths  via  multihoming  route  control. 
This  implies  that  multihoming  at  the  fi  rst  hop  is  essential  to  overcome  occasional  serious  problems 
in  the  access  ISP(s). 

5.2.5  A:-Multihoming  versus  A:-Overlays 

In  the  previous  section,  we  evaluated  1-overlay  routing,  where  all  overlay  paths  start  from  a  single 
ISP  in  the  source  city.  In  this  section,  we  allow  overlays  additional  flexibility  by  permitting  them  to 
initially  route  through  more  of  the  available  ISPs  in  each  source  city.  Specifi  cally,  we  compare  the 
performance  benefi  ts  of  A:-muhihoming  against  A:-overlay  routing. 

Across  all  cities,  an  average  of  10%  of  the  time  instants  did  not  have  a  valid  measurement  across  all  ISPs;  nearly  all 
of  these  cases  were  due  to  limitations  in  our  data  collection  infrastructure,  and  not  failed  download  attempts. 
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Figure  5.5:  Round-trip  time  improvement:  Round-trip  time  from  /c-multihoming  relative  to  /c-overlay  routing,  as 
a  function  of  k,  is  shown  in  (a).  In  (b),  we  show  the  distribution  of  the  number  of  overlay  hops  in  the  best  /c-overlay 
paths,  for  A;=3. 


In  the  ease  of  A:-overlays,  the  overlay  path  originating  from  a  souree  node  may  traverse  any 
intermediate  nodes,  ineluding  those  loeated  in  the  same  eity  as  the  souree.  Notiee  that  the  perfor- 
manee  from  A:-overlays  is  at  least  as  good  as  that  from  A:-multihoming  (sinee  we  allow  overlays  to 
take  a  direet  BGP  path.  The  question,  then,  is  how  mueh  more  advantage  do  overlays  provide  if 
multihoming  is  already  employed  by  the  souree. 

Round-trip  time  performance.  Figure  5.5(a)  shows  the  improvement  in  RTT  for  A:-muhihoming 
relative  to  A:-overlays,  for  various  values  of  k.  We  see  that  on  average,  for  k  =  3,  overlays  provide 
3-12%  better  RTT  performanee  than  the  best  multihoming  solution  in  most  of  the  eities  in  our  study. 
The  performanee  gap  between  multihoming  and  overlays  is  less  signifi  eant  for  k  >  4. 

Figure  5.5(b)  shows  the  distribution  of  the  number  of  overlay  hops  in  the  paths  seleeted  by 

3 - overlay  routing  optimized  for  RTT.  The  best  overlay  path  eoineides  with  the  best  3 -multihoming 
BGP  path  in  67%  of  the  eases,  on  average  aeross  all  eities  (the  average  y-axis  value  ntx  —  1).  Reeall 
that  the  eorresponding  fraetion  for  1-overlay  routing  in  Figure  5.2(b)  was  55%.  With  more  ISPs  to 
links  to  ehoose  from,  overlay  routing  seleets  a  higher  fraetion  of  direet  BGP  paths,  as  opposed  to 
ehoosing  from  the  greater  number  of  indireet  paths  also  afforded  by  multihoming. 

Throughput  performance.  Figure  5.6(a)  shows  the  throughput  performanee  of  fc-multihoming 
relative  to  A:-overlays  using  the  pessimistie  throughput  estimation  funetion.  From  this  fi  gure,  we 
see  that  multihoming  aehieves  throughput  performanee  within  1-10%  of  overlays,  for  A:  =  3.  The 
performanee  improves  up  to  A:  =  3  or  A:  =  4.  In  all  the  eities,  the  throughput  performanee  of 

4- muhihoming  is  within  3%  of  overlay  routing.  In  Figure  5.6(b),  we  also  show  the  fraetion  of  mea¬ 
surements  where  an  indireet  3 -overlay  path  offers  better  performanee  than  the  direet  3 -multihoming 
path,  for  the  pessimistie  throughput  estimate.  On  average,  this  fraetion  is  about  8%.  Notiee  that  this 
is  again  lower  than  the  eorresponding  pereentage  for  1-overlays  from  Table  5.1  («  17%). 

To  summarize,  when  employed  in  eonjunetion  with  multihoming,  overlay  routing  offers  marginal 
benefi  ts  over  employing  multihoming  alone.  In  addition,  A:-overlay  routing  seleets  a  larger  fraetion 
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City  Fraction  of 
indirect  paths 


(a)  Throughput  improvement  (b)  Fraetion  of  indireet 

(pessimistie  estimate)  paths  in  3-overlay  routing 

Figure  5.6:  Throughput  improvement:  Throughput  performance  of  /c-multihoming  relative  to  /c-overlays  for  vari¬ 
ous  cities  is  shown  in  (a).  The  table  in  (h)  shows  the  fraction  of  measurements  on  which  /c-overlay  routing  selected 
an  indirect  end-to-end  path,  for  the  case  of  /c  =  3. 

of  direet  BGP-based  end-to-end  paths,  eompared  to  1 -overlay  routing. 

5.2.6  Unrolling  the  Averages 

So  far,  we  presented  averages  of  the  performanee  differenees  for  various  forms  of  overlay  rout¬ 
ing  and  multihoming  route  eontrol.  In  this  seetion,  foeusing  on  3-overlays  and  3-multihoming,  we 
present  the  underlying  distributions  in  the  performanee  differenees  along  the  paths  we  measure.  Our 
goal  in  this  seetion  is  to  understand  if  the  averages  are  partieularly  skewed  by:  (1)  eertain  destina¬ 
tions,  for  eaeh  souree  eity  or  (2)  a  few  measurement  samples  on  whieh  overlays  offer  signifi  eantly 
better  performanee  than  multihoming  or  (3)  by  time-of-day  or  day-of-week  effeets. 

Performance  per  destination.  In  Figure  5.7(a),  we  show  the  distribution  of  the  average  differenee 
3-multihoming  path  and  the  best  3-overlay  path  to  destination  nodes  in  the  testbed  from  various 
origin  eities  (i.e.,  eaeh  point  represents  one  destination).  In  most  eities,  the  average  RTT  differenees 
aeross  80%  of  the  destinations  are  less  than  10ms.  Notiee  that  in  most  eities,  the  differenee  is  greater 
than  15ms  for  less  than  5%  of  the  destinations. 

In  Figure  5.7(b),  we  eonsider  the  distribution  of  the  average  throughput  differenee  of  the  best 
3-multihoming  path  and  the  best  3-overlay  path  for  the  pessimistie  estimate  of  throughput.  We  see 
the  throughput  differenee  is  less  than  1  Mbps  for  60-99%  of  the  destinations.  We  also  note  that,  for 
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(a)  Mean  difference  in 
round-trip  times 


(b)  Mean  difference  in 
throughputs  (pessimistic) 


Figure  5.7:  Performance  per  destination:  Figure  (a)  is  a  CDF  of  the  mean  difference  in  RTFs  along  the  best  overlay 
path  and  the  best  direct  path,  across  paths  measured  from  each  city.  Similarly,  Figure  (b)  plots  the  CDF  of  the 
mean  difference  in  throughputs  (pessimistic  estimate). 


Difference  in  throughputs  (Mbps) 


(b)  Throughput  (pessimistic) 


Figure  5.8:  Underlying  distributions:  Figure  showing  the  mean,  median,  10th  percentile  and  90th  percentile  differ¬ 
ence  across  various  source-destination  pairs.  Figure  (a)  plots  RTT,  while  fi  gure  (b)  plots  throughput  (pessimistic 
estimate). 


1-5%  of  the  destinations,  the  difference  is  in  excess  of  4  Mbps.  Recall  from  Figure  5.6,  however, 
that  these  differences  result  in  an  average  relative  performance  advantage  for  overlays  of  less  than 
1-10%  (for  k  =  3). 

Mean  versus  other  statistics.  In  Figures  5.8(a)  and  (b)  we  plot  the  average,  median,  and  10th  and 
90th  percentiles  of  the  difference  in  RTT  and  (pessimistic)  throughput,  respectively,  between  the 
best  3-multihoming  option  and  the  best  3-overlay  paths  for  all  cities.  In  Figure  5.8(a)  we  see  that 
the  median  RTT  difference  is  fairly  small.  More  than  90%  of  the  median  RTT  differences  are  less 
than  10ms.  The  90th  percentile  of  the  difference  is  marginally  higher  with  roughly  10%  greater 
than  15ms.  The  median  throughput  differences  in  Figure  5.8(b)  are  also  relatively  small  -  less  than 
500Kbps  about  90%  of  the  time.  Considering  the  upper  range  of  the  throughput  difference  (i.e.,  the 
90th  percentile  difference),  we  see  that  a  signifi  cant  fraction  (about  20%)  are  greater  than  2  Mbps. 

These  results  suggest  that  the  absolute  round-trip  and  throughput  differences  between  multi- 
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homing  and  overlay  routing  are  small  for  the  most  part,  though  there  are  a  small  fraetion  of  eases 
where  differenees  are  more  signifi  eant,  partieularly  for  throughput. 

Time-of-day  and  day-of-week  effects.  We  also  eonsider  the  effeets  of  daily  and  weekly  network 
usage  patterns  on  the  relative  performanee  of  fc-multihoming  and  fc-overlays.  It  might  be  expeeted 
that  route  eontrol  would  perform  worse  during  peak  periods  sinee  overlay  paths  have  greater  free¬ 
dom  to  avoid  eongested  parts  of  the  network.  We  do  not  see  any  diseernible  time-of-day  effeets  in 
paths  originating  from  a  speeifi  e  eity,  however,  both  in  terms  of  RTT  and  throughput  performanee. 

Similarly,  we  also  examine  weekly  patterns  to  determine  whether  the  differenees  are  greater 
during  partieular  days  of  the  week,  but  again  there  are  no  signifi  eant  differenees  for  either  RTT  or 
throughput.  We  omit  both  these  results  for  brevity.  The  laek  of  a  time-of-day  effeet  on  the  relative 
performanee  may  indieate  that  ISP  network  operators  already  take  sueh  patterns  into  aeeount  when 
performing  traffi  e  engineering. 

In  summary,  fc-overlays  offer  signifi  eantly  better  performanee  relative  to  fc-multihoming  for  a 
small  fraetion  of  transfers  from  a  given  eity.  We  observed  little  dependenee  on  the  time  of  the  day 
or  day  of  the  week  in  the  performanee  gap  between  overlays  and  multihoming. 

5.2.7  Reasons  for  Performance  Differences 

Next,  we  try  to  identify  the  eauses  of  performanee  differenees  between  fc-multihoming  and  k- 
overlay  routing.  We  foeus  on  the  RTT  performanee,  and  the  ease  A:  =  3.  First,  we  ask  if  indireet 
paths  primarily  improve  propagation  delay  or  mostly  seleet  less  eongested  routes  than  the  direet 
paths.  Then,  we  foeus  on  how  often  the  best-performing  indireet  paths  violate  eommon  inter-domain 
and  peering  polieies. 

5.2.7.1  Propagation  Delay  and  Congestion  Improvement 

In  this  seetion,  we  want  to  understand  whether  the  modest  advantage  we  observe  for  overlay  routing 
is  due  primarily  to  its  ability  to  fi  nd  “shorter”  (i.e.,  lower  propagation  delay)  paths  outside  of  BGP 
poliey  routing,  or  whether  the  gains  eome  from  being  able  to  avoid  eongestion  in  the  network. 

The  pairwise  instantaneous  RTT  measurements  we  eolleet  may  inelude  a  queuing  delay  eompo- 
nent  in  addition  to  the  base  propagation  delay.  When  performance  improvements  are  due  primarily 
to  routing  around  congestion,  we  expect  the  difference  in  propagation  delay  between  the  indirect 
and  direct  path  to  be  small.  Similarly,  when  the  propagation  difference  is  large,  we  can  attribute  the 
performance  gain  to  the  better  effi  ciency  of  overlay  routing  compared  to  BGP  in  choosing  “shorter” 
end-to-end  paths.  In  our  measurements,  to  estimate  the  propagation  delay  on  each  path,  we  take  the 
5th  percentile  of  the  RTT  samples  for  the  path. 

In  Figure  5.9,  we  show  a  scatter  plot  of  the  overall  RTT  improvement  (x-axis)  and  the  corre¬ 
sponding  propagation  time  difference  (y-axis)  offered  by  the  best  overlay  path  relative  to  the  best 
multihoming  path.  The  graph  only  shows  measurements  in  which  the  indirect  overlay  paths  offer 
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Difference  in  round-trip  time  (ms) 


Figure  5.9:  Propagation  vs  congestion:  A  scatter  plot  of  the  RTT  improvement  (x-axis)  vs  propagation  time  im¬ 
provement  (y-axis)  of  the  indirect  overlay  paths  over  the  direct  paths. 

an  improved  RTT  over  the  best  direct  path.  Points  near  the  y  —  0  line  represent  cases  in  which  the 
RTT  improvement  has  very  little  associated  difference  in  propagation  delay.  Points  near  the  y  =  x 
line  are  paths  in  which  the  RTT  improvement  is  primarily  due  to  better  propagation  time. 

For  paths  with  a  large  RTT  improvement  (e.g.,  >  50ms),  the  points  are  clustered  closer  to  the 
y  =  0  line,  suggesting  that  large  improvements  are  due  primarily  to  routing  around  congestion. 
We  also  found,  however,  that  66%  of  all  the  points  lie  above  the  y  —  x/2  line.  These  are  closer 
to  the  y  =  X  line  than  y  =  0,  indicating  that  a  majority  of  the  round-trip  improvements  do  arise 
from  a  reduction  in  propagation  delay.  In  contrast.  Savage  et  al.  [99]  observe  that  both  avoiding 
congestion  and  the  ability  to  fi  nd  shorter  paths  are  equally  responsible  for  the  overall  improvements 
from  overlay  routing.  The  difference  in  our  observations  from  those  in  [99]  could  be  due  to  the  fact 
that  Internet  paths  are  better  provisioned  and  less  congested  today  than  3-4  years  ago.  However,  they 
are  sometimes  circuitous,  contributing  to  inflation  in  end-to-end  paths  (see  [104,  110]  for  detailed 
accounts  of  path  inflation). 

To  further  investigate  the  relative  contributions  of  propagation  delay  and  congestion  improve¬ 
ments,  we  focus  more  closely  on  cases  where  indirect  overlay  paths  offer  a  signifi  cant  improvement 
(>  20ms)  over  the  best  direct  paths.  Visually,  these  are  points  lying  to  the  right  of  the  x  =  20  line 
in  Figure  5.9.  In  Table  5.2  we  present  a  classifi  cation  of  the  indirect  overlay  paths  offering  >  20ms 
RTT  improvement.  Recall  that,  in  our  measurement,  33%  of  the  indirect  3-overlay  paths  had  a 
lower  RTT  than  the  corresponding  best  direct  path  (Section  5.2.5,  Figure  5.5  (b)).  However,  only 
4.8%  of  these  paths  improved  the  delay  by  more  than  20ms  (Table  5.2,  row  3).  For  less  than  half  of 
these,  or  2.2%  of  all  lower  delay  overlay  paths,  the  propagation  delay  improvement  relative  to  direct 
paths  was  less  than  50%  of  the  overall  RTT  improvement.  Visually,  these  points  lie  to  the  right  of 
X  =  20  and  below  the  y  =  a;/2  lines  in  Figure  5.9.  Therefore,  these  are  paths  where  the  signifi  cant 
improvement  in  performance  comes  mainly  from  the  ability  of  the  overlay  to  avoid  congested  links. 
Also,  when  viewed  in  terms  of  all  overlay  paths  (see  Table  5.2,  column  3),  we  see  that  these  paths 
form  a  very  small  fraction  of  all  overlay  paths  (?s  0.7%). 

Finally,  if  we  consider  the  propagation  delay  of  the  best  indirect  overlay  path  versus  the  best 
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Tofal  fracfion  of  lower  delay 
overlay  pafhs 

33% 

Fracfion  of 

lower  delay  pafhs 

Fracfion  of  all 

overlay  pafhs 

Indirecf  pafhs  wifh  >  20ms 
improvemenf 

4.8% 

1.6% 

Prop  delay  improvemenf  < 
x%  of  overall  improvemenf 
(whenever  overall  improve¬ 
menf  >  20ms) 

<50% 

2.2% 

0.7% 

<25% 

1.7% 

0.6% 

<  10% 

1.3% 

0.4% 

Table  5.2:  Analysis  of  overlay  paths:  Classifi  cation  of  indirect  paths  offering  >  20ms  improvement  in  RTT  perfor¬ 
mance. 
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Figure  5.10:  “Circuitousness”  of  routes:  Figure  plotting  the  propagation  delay  of  the  best  Indirect  path  (y-axis) 
against  the  best  multihoming  path  (x-axis). 

multihoming  path,  we  can  get  some  idea  of  the  ability  of  either  system  to  avoid  overly  “circuitous” 
paths,  arising  from  policy  routing,  for  example.  Figure  5. 10  shows  a  scatter  plot  of  the  propagation 
delay  of  the  best  direct  path  from  a  city  (x-axis)  and  the  best  propagation  delay  via  an  indirect  path 
(y-axis).  Again,  points  below  the  y  —  x  line  are  cases  in  which  overlay  routing  fi  nds  shorter  paths 
than  conventional  BGP  routing,  and  vice  versa.  Consistent  with  the  earlier  results,  we  see  that  the 
majority  of  points  lie  below  the  y  =  x  line  where  overlays  fi  nd  lower  propagation  delay  paths. 
Moreover,  for  cases  in  which  the  direct  path  is  shorter  (above  the  y  —  x  line),  the  difference  is 
generally  small,  under  10  or  15ms. 

In  summary,  a  vast  majority  of  RTT  performance  improvements  from  overlay  routing  arise  from 
its  ability  to  fi  nd  shorfer  end-fo-end  pafhs  compared  fo  fhe  besf  direcf  BGP  pafhs.  However,  fhe  mosf 
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signifi  cant  improvements  stem  from  the  ability  of  overlay  routing  to  avoid  congested  ISP  links?. 


5.1.1. 1  Inter-domain  and  Peering  Policy  Compliance 

To  further  understand  the  performance  gap  between  some  overlay  routes  and  direct  BGP  routes,  we 
categorize  the  overlay  routes  by  their  compliance  with  common  inter-domain  and  peering  policies. 
Inter-domain  and  peering  policies  typically  represent  business  arrangements  between  ISPs  [42,  80]. 
Because  end-to-end  overlay  paths  need  not  adhere  to  such  policies,  we  try  to  quantify  the  perfor¬ 
mance  gain  that  can  be  attributed  to  ignoring  them. 

ISPs  typically  obey  two  key  inter-domain  policies  [43] : 

1 .  Valley-free  routing:  ISPs  generally  do  not  provide  transit  between  their  providers  or  peers 
because  it  represents  a  cost  to  them. 

2.  Prefer  customer  routing:  When  possible,  it  is  economically  preferable  for  an  ISP  to  route 
traffi  c  via  customers  rather  than  providers  or  peers,  and  peers  rather  than  providers. 

In  addition.  Spring  et  al.  [104]  observed  that  ISPs  often  obey  two  common  peering  policies: 

1 .  Early  exit:  ISPs  “offload”  traffi  c  to  peers  quickly  by  using  the  peering  point  closest  to  the 
source. 

2.  Late  exit:  Some  ISPs  cooperatively  carry  traffi  c  further  than  they  have  to  by  using  peering 
points  closer  to  the  destination. 

Also,  BGP  path  selection  is  impacted  by  the  fact  that  the  routes  must  have  the  shortest  AS  hop 
count.  In  this  section,  we  focus  on  indirect  overlay  paths  (i.e.,  >  1  virtual  hop)  that  provide  better 
end-to-end  round-trip  time  performance  than  the  corresponding  direct  BGP  paths.  To  characterize 
these  routes,  we  identifi  ed  AS  level  paths  using  traceroutes  performed  over  our  testbed  during  the 
same  period  as  the  RTT  measurements.  Each  turnaround  time  measurement  was  matched  with  a 
traceroute  that  occurred  within  20  minutes  of  it  (2.7%  did  not  have  corresponding  traceroutes  and 
were  ignored  in  this  analysis).  We  map  IP  addresses  in  the  traceroute  data  to  AS  numbers  using  a 
commercial  tool  which  uses  BGP  tables  from  multiple  vantage  points  to  extract  the  “origin  AS”  for 
each  IP  prefi  x  [3]. 

One  issue  with  deriving  the  AS  path  from  traceroutes  is  that  these  router-level  AS  paths  may 
be  different  than  the  actual  BGP  AS  path  [69,  13,  51],  often  due  to  the  appearance  of  an  extra  AS 
number  corresponding  to  an  Internet  exchange  point  or  a  sibling  AS^.  In  our  analysis,  we  omit 

^The  improvements  from  overlay  routing  could  also  be  from  overlays  choosing  higher  bandwidth  paths.  This  aspect 
is  difficult  to  quantify  and  we  leave  it  as  an  open  problem. 

®Two  ASes  identified  as  peers  may  actually  be  siblings  [108,  42],  in  which  case  they  would  provide  transit  for  each 
other’s  traffic  because  they  are  administered  by  the  same  entity.  We  classified  peers  as  siblings  if  they  appeared  to  provide 
transit  in  the  direct  BGP  paths  in  our  traceroutes,  and  also  manually  adjusted  pairings  that  were  not  related. 
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Improved  Overlay  Paths 

>20ms  Imprv  Paths 

% 

RTT  Imprv  (ms) 

% 

RTT  Imprv  (ms) 

Avg 

90th 

Avg 

90th 

Violates  Inter-Domain  Policy 

66.8 

8.3 

17 

68.7 

33.7 

40 

Valley-Free  Routing 

61.0 

8.2 

17 

58.5 

33.7 

40 

Prefer  Customer 

14.9 

8.9 

18 

16.3 

41.3 

47 

Valid  Inter-Domain  Path 

25.2 

7.3 

15 

19.4 

36.1 

44 

Same  AS -Level  Path 

15.3 

6.9 

13 

9.4 

40.9 

53 

Earlier  AS  Exit 

1.9 

5.6 

0.8 

43.2 

51 

Similar  AS  Exits 

6.9 

6.4 

12 

4.9 

39.6 

55 

Eater  AS  Exit 

6.5 

7.9 

14 

3.7 

42.1 

51 

Diff  AS-Eevel  Path 

9.9 

8.0 

17 

10.0 

31.5 

39 

Eonger  than  BGP  Path 

4.5 

7.6 

17 

4.6 

30.9 

43 

Same  Een  as  BGP  Path 

4.8 

8.6 

18 

5.3 

32.0 

37 

Shorter  than  BGP  Path 

0.6 

6.2 

9 

0.1 

36.4 

55 

Unknown 

8.0 

11.9 

Table  5.3:  Overlay  routing  policy  compliance:  Breakdown  of  the  mean  and  90th  percentile  round  trip  time  im¬ 
provement  of  indirect  overlay  routes  by:  (1)  routes  did  not  conform  to  common  inter-domain  policies,  and  (2) 
routes  that  were  valid  inter-domain  paths  but  either  exited  ASes  at  different  points  than  the  direct  BGP  route  or 
were  different  than  the  BGP  route. 


exchange  point  ASes,  and  also  combine  the  sibling  ASes,  for  those  that  we  are  able  to  identify.  To 
ascertain  the  policy  compliance  of  the  indirect  overlay  paths,  we  used  AS  relationships  generated 
by  the  authors  of  [108]  during  the  same  period  as  our  measurements. 

In  our  AS-level  overlay  path  construction,  we  ignore  the  ASes  of  intermediate  overlay  nodes  if 
they  were  used  merely  as  non-transit  hops  to  connect  overlay  path  segments.  For  example,  consider 
the  overlay  path  between  a  source  in  AS  SI  and  a  destination  in  D2,  composed  of  the  two  AS-level 
segments  SI  A1  B1  Cl  and  Cl  B2  D2,  where  the  intermediate  node  is  located  in  Cl.  If  the 
time  spent  in  Cl  is  short  (<  3ms),  and  B1  and  B2  are  the  same  ISP,  we  consider  the  AS  path  as 
SI  A1  B1  D 2,  otherwise  we  consider  it  as  SI  A1  B1  Cl  B2  D2.  Since  we  do  this  only  for 
intermediate  ASes  that  are  not  a  signifi  cant  factor  in  the  end-to-end  round-trip  difference,  we  avoid 
penalizing  overlay  paths  for  policy  violations  that  are  just  artifacts  of  where  the  intermediate  hop 
belongs  in  the  AS  hierarchy. 

Table  5.3  classifi  es  the  indirect  overlay  paths  by  policy  conformance.  As  expected,  the  majority 
of  indirect  paths  (67%)  violated  either  the  valley -free  routing  or  prefer  customer  policies.  However, 
a  large  fraction  of  overlay  paths  (25%)  appeared  to  be  policy  compliant.  We  sub-categorize  the  latter 
fraction  of  paths  further  by  examining  which  AS-level  overlay  paths  were  identical  to  the  AS-level 
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direct  BGP  path  and  which  ones  were  different. 

For  each  overlay  path  that  was  identical,  we  characterize  it  as  exiting  an  AS  earlier  than  the  direct 
path  if  it  remained  in  the  AS  for  at  least  20ms  less  than  it  did  in  the  direct  path.  We  characterized  it 
as  exiting  later  if  it  remained  in  an  AS  for  at  least  20ms  longer.  We  consider  the  rest  of  the  indirect 
paths  to  be  “similar”  to  the  direct  BGP  paths.  We  see  that  almost  all  identical  AS-level  overlay 
paths  either  exited  later  or  were  similar  to  the  direct  BGP  path.  This  suggests  that  cooperation 
among  ISPs,  e.g.,  in  terms  of  late  exit  policies,  can  improve  performance  on  BGP  routes  and  further 
close  the  gap  between  multihoming  and  overlays.  We  also  note  that  for  the  AS-level  overlay  paths 
that  differed,  the  majority  were  the  same  length  as  the  corresponding  direct  path  chosen  by  BGP. 

To  summarize,  in  achieving  better  RTT  performance  than  direct  BGP  paths,  most  indirect  over¬ 
lay  paths  violate  common  inter-domain  routing  policies.  We  observe  that  a  fraction  of  the  policy- 
compliant  overlay  paths  could  be  realized  by  BGP  if  ISPs  employed  cooperative  peering  policies 
such  as  late  exit. 


5.3  Resilience  to  Path  Failures 

BGP’s  policy-based  routing  architecture  masks  a  great  deal  of  topology  and  path  availability  infor¬ 
mation  from  end-networks  in  order  to  respect  commercial  relationships  and  limit  the  impact  of  local 
changes  on  neighboring  downstream  ASes.  This  design,  while  having  advantages,  can  adversely 
affect  the  ability  of  end-networks  to  react  quickly  to  service  interruptions  since  notifi  cations  via 
BGP’s  standard  mechanisms  can  be  delayed  by  tens  of  minutes  [62].  Networks  employing  multi¬ 
homing  route  control  can  mitigate  this  problem  by  monitoring  paths  across  ISP  links,  and  switching 
to  an  alternate  ISP  when  failures  occur.  Overlay  networks  provide  the  ability  to  quickly  detect  and 
route  around  failures  by  frequently  probing  the  paths  between  all  overlay  nodes. 

In  this  section,  we  perform  two  separate  preliminary  analyses  of  the  resilience  datasets  described 
earlier  in  Section  4.4  to  assess  the  ability  of  both  mechanisms  to  withstand  end-to-end  path  failures 
and  improve  availability  of  Internet  paths.  As  described  in  Chapter  4,  the  ti  rst  approach  evaluates 
the  availability  provided  by  route  control  based  on  active  probe  measurements  on  our  testbed.  In 
the  second  we  compute  the  end-to-end  path  availability  from  both  route  control  and  overlays  using 
estimated  availabilities  of  routers  along  the  paths. 

Our  aim  is  to  answer  the  following  questions:  Is  multihoming  route  control  similar  to  overlay 
routing  in  improving  end-to-end  fault  tolerance?  If  not,  is  the  resilience  from  multihoming  still 
reasonable  enough  to  mask  almost  all  failures  experienced  by  the  end  network? 

5.3.1  Active  Measurements  of  Path  Availability 

We  evaluated  the  ability  of  multihoming  to  improve  path  availability  earlier  in  Section  4.4.1.  As 
noted  then,  even  using  two  ISPs  can  signifi  candy  improve  the  availability  of  Internet  paths.  For 
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Figure  5.11:  Availability  comparison:  Comparison  of  availability  averaged  across  paths  originating  from  six  cities 
using  a  single  ISP,  3-multihoming,  1-overlays,  and  3-overlays.  ISPs  are  chosen  based  on  their  round-trip  time 
performance. 

example,  less  than  1%  of  the  paths  originating  from  the  eities  we  studied  have  an  availability  un¬ 
der  99.9%  when  2-multihoming  was  employed.  The  minimum  availability  aeross  all  the  paths  is 
99.85%,  whieh  is  mueh  higher  than  without  multihoming  (91%).  From  our  analysis  of  overlay  rout¬ 
ing,  we  fi  nd  that  it  is  indeed  able  to  eireumvent  even  the  few  failures  that  route  eontrol  eould  not 
avoid  (results  omitted  for  brevity).  However,  this  results  in  only  a  marginal  improvement  over  route 
eontrol  whieh  already  offers  very  good  availability. 

5.3.2  Path  Availability  Analysis 

In  Figure  5.11  we  eompare  the  average  availability  using  overlays  and  route  eontrol  on  paths  orig¬ 
inating  from  6  eities  to  all  destinations  in  our  testbed,  based  on  the  router  failure  data  presented 
earlier  in  Seetion  4.4.2.  For  overlay  routing,  we  only  ealeulate  the  availability  of  the  paths  for  the 
fi  rst  and  last  overlay  hop  (sinee  these  will  be  the  same  no  matter  whieh  intermediate  hops  are  used), 
and  assume  that  there  is  always  an  available  path  between  other  intermediate  hops.  An  ideal  overlay 
has  a  praetieally  unlimited  number  of  path  ehoiees,  and  ean  avoid  a  large  number  of  failures  in  the 
middle  of  the  network. 

As  deseribed  earlier,  the  availability  of  the  paths  in  our  testbed  is  relatively  high  overall.  3- 
multihoming  improves  the  average  availability  by  0.15-0.24%  in  all  the  eities.  In  most  eases,  1- 
overlays  have  slightly  higher  availability  (at  most  about  0.07%).  Sinee  a  1 -overlay  has  arbitrary 
flexibility  in  ehoosing  intermediate  hops,  only  about  2.7  routers  are  eommon  (on  average)  between 
all  possible  overlay  paths,  eompared  to  about  4.2  in  the  3-multihoming  ease.  However,  note  that  a 
1-overlay  path  using  a  single  ISP  is  more  vulnerable  to  aeeess  link  failures  than  when  multihoming 
is  employed.  For  example,  the  low  availability  of  the  1-overlay  in  Chieago  is  due  to:  (1)  the  ehosen 
ISP  (based  on  RTT  performanee)  is  a  tier  4  network,  whieh  has  internal  routers  with  relatively  lower 
availability,  and  (2)  all  paths  exiting  that  ISP  have  the  fi  rst  5  hops  in  eommon  and  henee  have  a 
high  ehanee  of  eorrelated  failures.  Finally,  we  see  that  using  a  3-overlay  usually  makes  routes  only 
slightly  more  available  than  when  using  a  1 -overlay  (between  0.01%  to  0.08%,  exeluding  Chieago). 
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This  is  because  at  least  one  router  is  shared  by  all  paths  approaching  a  destination,  so  failures  at  that 
router  impact  all  possible  overlay  paths. 

In  summary,  we  note  that  despite  the  greater  flexibility  of  overlays,  route  control  with  3-multihoming 
is  still  able  to  achieve  an  estimated  availability  within  0.08-0.10%  (or  about  7  to  9  hours  each  year) 
of  3 -overlay. 

5.4  Measurement  Caveats,  Summary  of  Observations  and  their  Im¬ 
plications 

In  this  section,  we  summarize  the  observations  made  from  our  measurements.  We  highlight  other 
fundamental  trade-offs  between  overlay  routing  and  multihoming  route  control  that  are  diffi  cult  to 
assess.  We  also  comment  on  the  limitations  of  our  study. 

Our  key  observations  as  summarized  in  Table  5.4.  As  expected,  our  results  show  that  overlay 
routing  does  provide  improved  latency,  throughput,  and  reliability  over  multihoming  route  control. 
However,  the  difference  between  the  two  systems  is  insignificant  for  all  practical  purposes.  This  is 
despite  the  fact  that  multihoming  offers  much  lesser  flexibility  in  routing  than  overlays.  Moreover, 
multihoming  does  not  require  a  third-party  deployment  and  is  a  purely  endpoint  based  mechanism. 

We  found  that  overlay  routing’s  performance  gains  arise  primarily  from  the  ability  to  fi  nd  routes 
that  are  physically  shorter  (i.e.  shorter  propagation  delay).  In  addition,  its  reliability  advantages 
stem  from  having  at  its  disposal  a  superset  of  the  routes  available  to  standard  routing.  The  surprise 
in  our  results  is  that,  while  past  studies  of  overlay  routing  have  shown  this  advantage  to  be  large,  we 
found  that  careful  use  of  a  few  additional  routes  via  multihoming  at  the  end-network  was  enough 
to  signifi  cantly  reduce  the  advantage  of  overlays.  Since  their  performance  is  similar,  the  question 
remains  whether  overlays  or  multihoming  is  the  better  choice.  To  answer  this,  we  must  look  at  other 
factors  such  as  cost,  deployment  issues  and  future  upgrades  to  BGP  routing.  We  discuss  these  issues 
next.  In  addition,  we  also  outline  key  constraints  imposed  by  our  comparison  study. 

Cost  of  operation.  Unfortunately,  it  was  diffi  cult  to  consider  the  cost  of  implementing  route  control 
or  overlays  in  our  evaluation.  In  the  case  of  multihoming,  a  stub  network  must  pay  for  connectivity 
to  a  set  of  different  ISPs.  We  note  that  different  ISPs  charge  different  amounts  and  therefore  the 
solution  we  consider  “best”  may  not  be  the  most  cost-effective  choice.  In  the  case  of  overlays,  we 
envision  that  there  will  be  overlay  service  offerings,  similar  to  Akamai’s  SureRoute  [4].  Users  of 
overlays  with  multiple  fi  rst  hop  choices  (fc-overlay  routing  in  our  analysis)  must  add  the  cost  of 
subscribing  to  the  overlay  service  to  the  base  cost  of  ISP  multihoming.’  Using  an  overlay  with  a 
single  ISP  (i.e.,  1-overlays)  would  eliminate  this  additional  cost,  but  our  analysis  shows  that  the 
performance  gain  is  reduced  signifi  cantly. 

’if  the  ISPs  charge  according  to  usage,  then  the  cost  of  employing  multiple  ISP  connections  in  the  case  of  /c-overlays 
may  be  higher  or  lower  than  the  cost  of  using  multiple  connections  in  the  case  of  /c -multihoming. 
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While  overlay  routing  signifi  cantly  out  performs  using  BGP  paths  via  a  single  ISP,  im¬ 
proving  the  routing  flexibility  of  endpoint  by  allowing  2-3  ISP  connections  signifi  cantly 
reduces  the  gap  between  BGP-based  routing  and  overlay  routing. 


The  RTTs  from  multihoming  to  3  ISPs  are  within  5-15%  of  those  from  employing  a 
combination  of  overlay  routing  and  multihoming.  Throughput  differences  are  more 
marginal  with  the  best  overlay  path  offers  only  1-10%  higher  transfers  speeds  than  the 
best  multihoming  path. 


We  also  notice  that  multihoming  offers  similar,  of  not  better,  performance  than  1- 
overlay  routing.  This  implies  that  multihoming  at  the  fi  rst  hop  is  absolutely  essential  to 
overcome  occasional  serious  problems  in  the  access  ISP. 


The  primary  reason  for  the  ability  of  overlay  routing  to  offer  better  performance  than 
multihoming  (albeit  only  marginally  so)  is  the  ability  of  overlays  to  circumvent  serious 
congestion  along  ISP  paths  that  multihoming  cannot  avoid. 


The  superiority  of  overlay  routing  can  be  reduced  even  further  if  ISPs  employed  coop¬ 
erative  peering  policies  such  as  “cold  potato”  routing. 


Multihoming  route  control  cannot  offer  the  near  perfect  availability  of  overlay  routing. 
Nevertheless,  multihoming  can  eliminate  most  failures  observed  by  singly-homed  end 
networks. 

Table  5.4:  Multihoming  Route  Control  vs.  Overlay  Routing:  Summary  of  key  comparison  results. 

Deployment  and  operational  overhead.  Overlays  and  multihoming  each  have  their  unique  set  of 
deployment  and  performance  challenges  that  our  measurements  do  not  highlight.  Below,  we  con¬ 
sider  the  issues  of  ease  of  use  and  deployment,  routing  table  expansion  and  routing  policy  violations. 
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Ease  of  use  and  employment.  Overlay  routing  requires  a  third-party  to  deploy  a  potentially  large 
overlay  network  infrastrueture.  Building  overlays  of  suffi  eient  size  and  distribution  to  aehieve  sig- 
nifi  eantly  improved  round-trip  and  throughput  performanee  is  ehallenging  in  terms  of  infrastrueture 
and  bandwidth  eost,  as  well  as  management  eomplexity.  On  the  other  hand,  sinee  multihoming  is 
a  single  endpoint  based  solution,  it  is  relatively  easier  to  deploy  and  use  from  an  end-network’s 
perspective. 

Routing  table  expansion  due  to  multihoming.  An  important  overhead  of  multihoming  that  we  did 
not  consider  in  this  study  is  the  resulting  increase  in  the  number  of  routing  table  entries  in  backbone 
routers.  ISPs  will  likely  charge  multihomed  customers  appropriately  for  any  increased  overhead 
in  the  network  core,  thus  making  multihoming  less  desirable.  However,  this  problem  occurs  only 
when  the  stub  network  announces  the  same  address  range  to  each  of  its  ISPs.  Since  ISPs  often  limit 
how  small  advertised  address  blocks  can  be,  this  approach  makes  sense  for  large  and  medium  sized 
stub  networks,  but  is  more  diffi  cult  for  smaller  ones.  Smaller  networks  could  instead  use  techniques 
based  on  network  address  translation  (NAT)  to  avoid  issues  with  routing  announcements  and  still 
make  intelligent  use  of  multiple  upstream  ISPs  [48].  This  is  discussed  further  in  Chapter  6. 

Violation  of  policies  by  overlay  paths.  One  of  the  concerns  that  overlay  routing  raises  is  its  circum¬ 
vention  of  routing  policies  instituted  by  intermediate  ASes.  For  example,  a  commercial  endpoint 
could  route  data  across  the  relatively  well-provisioned,  academic  Internet2  backbone  by  using  an 
overlay  hop  at  a  nearby  university.  While  each  individual  overlay  hop  would  not  violate  any  policies 
(i.e.,  the  nearby  university  node  is  clearly  allowed  to  transmit  data  across  Internet2),  the  end-to-end 
policy  may  be  violated.  While  our  analysis  quantifi  es  the  number  of  routing  policy  violations,  we 
did  not  consider  their  impact.  Most  Internet  routing  polices  are  related  to  commercial  relationships 
between  service  providers.  Therefore,  it  is  reasonable  to  expect  that  the  presence  of  an  overlay  node 
in  an  ISP  network  implies  that  the  overlay  provider  and  the  ISP  have  some  form  of  business  agree¬ 
ment.  This  relationship  should  require  that  the  overlay  provider  pay  for  additional  expenses  that  the 
ISP  incurs  by  providing  transit  to  overlay  traffi  c.  Network  providers  would  thus  be  compensated  for 
most  policy  violations,  limiting  the  negative  impact  of  overlay  routing. 

Future  changes  to  BGP.  Thus  far,  we  have  discussed  some  important  issues  regarding  overlays 
and  route  control  in  today’s  environment,  but  have  not  considered  changes  to  BGP  that  may  further 
improve  standard  Internet  routing  performance  relative  to  overlays.  For  example,  we  only  consider 
the  impact  of  performance  or  availability-based  route  selection  at  the  edge  of  the  network.  It  is  pos¬ 
sible  that  transit  ASes  could  perform  similar  route  control  in  the  future,  thereby,  exposing  a  superior 
set  of  AS  paths  to  end  networks.  Another  future  direction  is  the  development  of  new  protocols  for 
AS -level  source-routing,  such  as  NIRA  [117],  which  allow  stub  networks  greater  control  over  their 
routes. 

Limitations  of  the  Comparison.  Finally,  we  discussion  the  constraints  imposed  by  our  comparison 
study.  Our  observations  may  be  constrained  by  a  few  factors  such  as  the  size  of  our  testbed,  the 
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Number  of  overlay  nodes 


Figure  5.12:  Impact  of  overlay  network  size  on  round-trip  performance:  This  graph  shows  the  mean  difference 
between  3-overlays  and  3-multihoming  as  overlay  nodes  are  added. 

coarse  granularity  of  our  performance  samples,  and  our  limited  analysis  of  resilience.  We  discuss 
these  issues  in  detail  below. 

Testbed  size.  In  Figure  5.12  we  compare  the  average  RTT  performance  from  3-multihoming  against 
3-overlays,  as  a  function  of  the  number  of  intermediate  overlay  nodes  available.  The  graph  shows 
the  RTT  difference  between  the  best  3 -overlay  path  (direct  or  indirect)  and  best  3 -multihoming 
path,  averaged  across  all  measurements  as  nodes  are  added  one-by-one,  randomly,  to  the  overlay 
network.  A  different  heuristic  of  adding  nodes  may  yield  different  results.  As  the  size  of  the  overlay 
is  increased,  the  performance  of  3-overlays  gets  better  relative  to  multihoming.  Although  the  relative 
improvement  is  marginal,  there  is  no  discernible  “knee”  in  the  graph.  Therefore  it  is  possible  that 
considering  additional  overlay  nodes  may  alter  the  observations  in  our  study  in  favor  of  overlay 
routing.  However,  the  key  point  to  notice  from  our  measurements  is  that  allowing  endpoints  the 
flexibility  to  select  from  overlay  paths  over  68  nodes  did  not,  in  the  end,  offer  too  much  additional 
benefi  t  over  a  choice  between  three  paths.  To  widen  the  gap  between  overlay  routing  and  route 
control  by,  say,  about  30%  or  higher,  we  may  require  a  much  larger,  geographically  diverse  overlay 
deployment.  Such  large  overlays  suffer  from  a  few  key  drawbacks:  they  are  expensive  to  deploy, 
manage  and  to  subscribe  to;  and,  they  incur  enormous  overhead  when  tracking  the  performance  of 
overlay  paths  (since  the  number  of  paths  to  probe  scales  quadratically  in  overlay  size).  Practical 
techniques  for  lowering  measurement  overhead  in  larger  overlays  are  yet  to  be  developed. 

Granularity  of  performance  samples.  Our  performance  samples  are  collected  at  fairly  coarse  timescales 
(6  minutes  intervals  for  round-trip  time  and  30  minutes  for  throughput).  As  a  result,  our  results  may 
not  capture  very  fi  ne-grained  changes,  if  any,  in  the  performance  on  the  paths,  and  their  effect  on 
either  overlay  routing  or  multihoming  route  control.  However,  we  believe  that  our  results  capture 
much  of  observable  performance  differences  between  the  two  path  selection  techniques  for  two  key 
reasons:  (1)  our  conclusions  are  based  on  data  collected  continuously  over  a  week-long  period,  and 
across  a  fairly  large  set  of  paths,  and  (2)  Zhang  et  al.  observed  that  the  “steadiness”  of  both  round- 
trip  time  and  throughput  performance  is  at  least  on  the  order  of  minutes  [118].  As  we  shall  show 
in  the  next  chapter,  round-trip  times  on  paths  in  our  testbed  have  mean  intervals  of  several  minutes 
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between  changes  of  30%  or  more.  As  such,  we  do  not  expect  that  a  higher  sampling  frequency 
would  yield  signifi  cantly  different. 

Repair  and  failure  detection.  Our  reliability  analysis  does  not  compare  the  relative  ability  of  over¬ 
lay  routing  and  multihoming  to  avoid  BGP  convergence  problems.  For  example,  a  peering  link 
failure  may  affect  routing  between  the  peer  ISPs  until  BGP  re-converges.  It  is  possible  that  some 
multihoming  conti  gurations  cannot  avoid  such  routing  failures.  We  leave  this  comparison  for  future 
work. 


The  key  observations  in  this  comparison  study  can  be  simply  summarized  as  follows:  It  is 
not  necessary  to  circumvent  BGP  routing  to  achieve  good  end-to-end  resilience  and  performance. 
These  goals  can  be  simply  realized  by  enabling  moderately  higher  route  selection  flexibility  at  end 
networks  using  purely  endpoint  based  mechanisms. 

So  far,  we  considered  the  potential  benefi  ts  of  multihoming  route  control.  We  studied  an  ideal 
form  of  multihoming  driven  by  several  simplifying  assumptions.  In  the  next  chapter,  we  ask  whether 
these  assumptions  can  be  overcome  in  a  practical  deployment  scenario  while  still  preserving  the 
performance  benefi  ts  of  multihoming  route  control. 
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Chapter  6 


Practical  Multihoming  Route  Control  Strategies 


Over  the  past  few  years,  multihoming  has  been  inereasingly  leveraged  for  improving  wide-area  net¬ 
work  performanee,  lowering  bandwidth  eosts,  and  optimizing  the  way  in  whieh  upstream  links  are 
used  [76],  apart  from  ensuring  resilienee  from  serviee  interruptions.  Indeed,  a  number  of  produets 
provide  these  route  eontrol  eapabilities  both  to  large  enterprise  eustomers,  whieh  have  their  own 
publie  AS  number  and  advertise  their  IP  address  prefi  xes  to  to  upstream  ISPs  using  BGP  [106,  98, 
53],  as  well  as  to  smaller  multihomed  organizations  whieh  do  not  use  BGP  [79,  94,  35].  All  of  these 
produets  use  a  variety  of  meehanisms  and  polieies  for  route  eontrol.  However,  very  little  is  known 
about  the  exaet  meehanisms  employed,  or  the  quantitative  benefi  ts  of  the  mechanisms. 

In  Chapter  4,  we  studied  the  potential  performance  benefi  ts  of  multihoming  route  control  mech¬ 
anisms.  We  showed  that  performance  could  potentially  improve  by  more  than  25%  in  terms  of  RTT 
and  20%  in  terms  of  throughput  when  multiple  upstream  ISPs  are  employed.  Furthermore,  in  Chap¬ 
ter  5,  we  showed  that  the  performance  benefi  fs  from  mulfihoming  are  comparable  wifh  techniques 
such  as  overlay  roufing  fhaf  allow  much  greater  routing  flexibilify  fo  end-nefworks.  In  eifher  case, 
we  evaluafed  an  ideal  form  of  mulfihoming  where  fhe  end-nefwork  has  perfecf  informafion  abouf 
fhe  performance  across  all  ISPs  af  any  time  and  could  change  roufes  arbifrarily  oflen. 

In  fhis  chapter,  our  goal  is  fo  undersfand  if,  and  how,  fhese  benefi  fs  can  be  realized  in  a  more 
pracfical  mulfihoming  scenario.  Specifi  cally,  we  address  fhe  following  quesfion: 

Can  fhe  benefi  fs  of  mulfihoming  route  confrol  be  realized  under  pracfical  deploymenf 
sifuafions?  If  so,  whaf  exacf  mechanisms  should  end-nefworks  employ? 

We  explore  several  design  alfernafives  fo  realize  fhe  performance  benefi  Is  from  mulfihoming  in 
pracfice.  Our  focus  is  on  enferprise  nefworks  wifh  multiple  ISP  connecfions.  We  focus  primarily  on 
mechanisms  used  for  inbound  roufe  confrol,  since  enterprises  are  mainly  inferesfed  in  optimizing 
nefwork  performance  for  fheir  own  clienfs  who  download  confenf  from  fhe  Infemef  (i.e.,  sink  dafa). 
However,  as  we  discuss  in  fhis  chapfer,  our  mechanisms  can  also  be  extended  fo  multihomed  confenf 
provider  nefworks  which  source  more  dafa  fhan  fhey  sink. 

We  develop  a  variefy  of  active  and  passive  measuremenf  sfrafegies  for  mulfihomed  enterprises  fo 
esfimafe  fhe  insfanfaneous  performance  of  fheir  ISP  links  and  pick  fhe  besf  ISP  for  a  given  Iransfer. 
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We  evaluate  these  strategies  in  the  eontext  of  a  NAT-based  implementation  to  eontrol  the  inbound 
ISP  link  used  by  enterprise  eonneetions.  We  address  a  number  of  praetieal  issues  sueh  as  the  use¬ 
fulness  of  past  history  to  guide  the  ehoiee  of  the  best  ISP  link,  the  effeets  of  sampling  frequeney 
on  measurement  aeeuraey,  and  the  overhead  of  managing  performanee  information  for  a  potentially 
large  set  of  target  destinations.  We  evaluate  these  polieies  using  several  elient  workloads,  and  an 
emulated  wide-area  network  where  delay  eharaeteristies  are  based  on  a  large  set  of  real  network 
delay  measurements. 

Our  evaluation  of  the  proposed  sehemes  (for  a  3-multihomed  enterprise  network)  shows  that 
both  aetive  and  passive  measurement-based  teehniques  are  equally  effeetive  in  extraeting  the  per¬ 
formanee  benefi  ts  of  using  multiple  ISPs.  These  sehemes  offer  about  15-25%  improvement  in  Web 
response  times  (i.e.,  Web  request  eompletion  times)  when  eompared  to  using  a  single  ISP.  In  de- 
eiding  whieh  ISP  to  seleet  for  a  transfer,  we  show  that  the  most  eurrent  sample  of  the  performanee 
to  a  destination  via  a  given  ISP  is  a  reasonably  good  estimator  of  the  near-term  performanee  to  the 
destination.  We  also  show  that  the  overhead  of  eolleeting  and  managing  performanee  information 
for  various  destinations  is  negligible.  Finally,  we  eonduet  an  initial  study  of  meehanisms  to  eon¬ 
trol  the  ISP  link  used  by  external  Internet  elients  who  initiate  eonneetions  to  servers  hosted  in  the 
enterprise. 

Chapter  outline.  In  Seetion  6.1,  we  describe  our  enterprise  multihoming  solution,  various  strate¬ 
gies  for  estimating  ISP  performance,  and  our  route  control  mechanisms.  Section  6.2  describes  our 
implementation  in  further  detail.  In  Section  6.3,  we  discuss  the  experimental  set-up  and  results  from 
our  evaluation  of  the  solution.  Section  6.4  discusses  additional  route  control  design  and  operational 
issues.  In  Section  6.5,  we  summarize  common  techniques  employed  in  commercial  route  control 
products  as  well  as  other  related  research  studies.  Finally,  Section  6.6  summarizes  the  observations 
in  this  chapter. 

6.1  Solution  Overview 

In  order  to  realize  the  performance  benefi  ts  of  multihoming  in  practice,  a  route  control  solution  re¬ 
quires  three  key  functions  (illustrated  in  Figure  6.1):  (1)  monitoring  ISP  links,  (2)  choosing  the  best 
ISP  link  at  a  given  instant,  and  (3)  directing  traffi  c  over  the  best  ISP  links.  We  discuss  the  functional 
design  of  each  of  these  below.  We  discuss  the  actual  implementation  details  in  Section  6.2. 

6.1.1  Monitoring  ISP  Links 

Selecting  the  right  ISP  link  over  which  to  direct  each  transfer  is  crucial  to  realizing  the  performance 
benefi  ts  of  multihoming  from  the  enterprise  network’s  perspective.  The  choice  of  the  right  ISP 
clearly  depends  on  the  time-varying  performance  of  ISP  links  to  the  destination  being  accessed. 
There  are  two  further  issues  in  monitoring  performance  over  ISP  links:  what  to  monitor  and  how. 
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Figure  6.1:  Solution  steps:  This  fi  gure  illustrates  the  three  main  operations  of  an  enterprise  route  control  system. 

An  enterprise  may  ideally  like  to  monitor  the  performance  from  all  content  provider  sites  over  its 
ISP  links.  However,  this  may  be  infeasible,  as  large  enterprises  often  access  content  from  many 
different  sources.  A  simple  solution  then  is  to  monitor  only  the  most  important  destinations,  on  the 
basis  of  the  volume  of  requests  made  from  the  enterprise  (e.g.,  the  top  100  most  frequently  accessed 
destinations).  This  would  ensure  that  a  signifi  cant  fraction  of  all  flows  will  still  experience  good 
performance. 

For  the  second  question  (i.e.,  how  to  monitor),  two  common  approaches  are  active  monitor¬ 
ing  and  passive  monitoring.  In  active  monitoring,  the  multihomed  enterprise  performs  out-of-band 
measurements  of  performance  to  or  from  selected  destinations  across  its  ISP  links.  These  measure¬ 
ments  could  be  simple  pings  involving,  for  example,  ICMP  ECHO_REQUEST  or  TCP  SYN  packets 
to  the  destinations.  Passive  measurement  mechanisms  rely  on  observing  the  performance  of  ongo¬ 
ing  transfers  (i.e.,  in-band)  to  destinations,  and  using  these  observations  as  samples  for  estimating 
performance  over  an  ISP  link.  However,  in  order  to  ensure  that  there  are  enough  samples  over  all 
ISPs,  it  may  be  necessary  to  forcibly  direct  some  transfers  over  particular  ISPs. 

An  important  issue  in  monitoring  performance  is  the  time  interval  for  monitoring.  A  long  in¬ 
terval  between  performance  samples  implies  using  stale  information  to  estimate  ISP  performance. 
This  might  result  in  a  suboptimal  choice  of  the  ISP  link  for  a  particular  destination.  While  using 
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smaller  time  intervals  eould  address  this  issue,  it  eould  have  a  negative  impaet  as  well.  In  aetive 
monitoring,  frequent  measurements  give  rise  to  exeessive  measurement,  wasted  bandwidth  and  pro- 
eessing  overhead.  Some  destinations  might  even  interpret  this  traffi  e  as  a  seeurity  threat.  In  passive 
monitoring,  frequent  sampling  may  eause  too  many  eonneetions  to  be  direeted  over  sub-optimal 
ISPs  for  obtaining  “fresh”  performanee  samples.  Therefore,  a  eareful  ehoiee  of  the  interval  size  is 
important. 

6.1.2  Choosing  the  Best  ISP 

The  next  eomponent  is  to  seleet  the  best  ISP.  This  ehoiee  must  be  made  on  a  per-destination  ba¬ 
sis,  and  at  fi  ne  time-seales.  An  important  question  is  whether  (and  how)  historieal  data  about  ISP 
performanee  to  a  given  destination  should  be  used  in  determining  the  ISP  to  use.  The  historieal 
performanee  of  an  ISP  link  to  a  destination  ean  be  traeked  by  keeping  a  smoothed,  time- weighted 
estimate  of  the  performanee,  for  example  an  exponentially-weighted  moving  average  (EWMA).  If 
performanee  of  using  an  ISP  P  to  reaeh  destination  D  at  time  ti  is  st^  (as  obtained  from  aetive  or 
passive  measurement)  and  the  previous  performanee  sample  was  from  time  fj_i,  then  the  EWMA 
metrie  at  time  ti  is: 

EWMAt^{P,D)  =  (1  - 

-h  (P,  D) 

where  a  >  0  is  a  eonstant.  A  smaller  value  of  a  attaehes  less  weight  to  historieal  samples.  A 
value  of  a  =  0  implies  no  relianee  on  history.  At  any  time,  the  ISP  with  the  best  performanee  as 
ealeulated  above  eould  be  ehosen  for  a  given  transfer.  When  no  history  is  employed  (a  =  0),  only 
the  most  reeent  performanee  sample  is  used  to  evaluate  the  ISPs  and  to  seleet  the  best. 

6.1.3  Directing  Traffi  c  Over  Selected  ISPs 

Onee  the  best  ISP  for  a  transfer  is  identifi  ed,  the  traffi  e  from  the  destination  must  be  direeted  over 
the  ehosen  link.  Controlling  the  outbound  direetion  of  traffi  e  is  easy  and  well-sfudied.  Our  foeus, 
rafher,  is  on  fhe  inbound  route  control  meehanism.  Inbound  eonfrol  refers  fo  seleefing  fhe  righf  ISP 
or  incoming  inlerfaee  on  whieh  fo  receive  dafa.  Eor  an  enferprise  nefwork,  fhe  primary  meehanisms 
available  are  roufe  adverfisemenfs  and  use  of  differenf  addresses  for  differenf  eonneefions.  Here,  we 
diseuss  how  fhese  eonfrols  ean  be  implemenfed. 

If  an  enterprise  has  ifs  own  IP  address  bloek,  if  ean  advertise  differenf  address  ranges  fo  ifs 
upsfream  ISPs.  Consider  a  site  multihomed  fo  fwo  ISPs  whieh  owns  a  /19  address  bloek.  The 
sife  announees  parfs  of  ifs  address  bloek  fo  ifs  ISPs  (e.g.,  a  disfinef  /20  sub-bloek  fo  eaeh  ISP). 
Then,  depending  on  whieh  of  fhe  fwo  ISP  links  is  eonsidered  superior  for  Iraffi  e  from  a  parfieular 
desfinafion,  fhe  sife  would  use  a  souree  address  from  fhe  appropriafe  /20  address  bloek.  This  ensures 
fhaf  all  ineoming  paekefs  for  fhe  eonneefion  would  arrive  via  fhe  appropriafe  ISP  link.  In  eases  where 
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the  enterprise  is  simply  assigned  an  address  bloek  by  its  upstream  ISP,  it  may  be  neeessary  to  also 
send  outbound  paekets  via  the  desired  ISP  to  ensure  that  the  ISP  forwards  the  paekets'. 

While  ensuring  that  a  eonneetion  uses  a  partieular  address  lies  at  the  heart  of  route  eontrol,  dif¬ 
ferent  teehniques  must  be  employed  for  handling  eonneetions  that  are  initiated  from  the  enterprise, 
than  for  eonneetions  that  are  aeeepted  into  the  site  from  external  elients.  These  are  diseussed  below. 

Initiated  Connections:  Handling  connections  initiated  from  an  enterprise  site  amounts  to  en¬ 
suring  that  the  remote  content  provider  transmits  data  such  that  the  enterprise  ultimately  receives  it 
over  the  chosen  ISP  Inbound  control  can  be  achieved  by  having  the  edge  router  translate  the  source 
addresses  on  the  connections  initiated  from  its  network  to  those  belonging  to  the  chosen  ISP’s  ad¬ 
dress  block  (i.e.,  the  appropriate  /20  block  in  the  example  above)  via  simple  NAT-like  mechanisms. 
This  ensures  that  the  replies  from  the  destination  will  arrive  over  the  appropriate  ISP. 

Accepted  Connections:  Inbound  route  control  over  connections  accepted  into  a  site  is  neces¬ 
sary  when  the  enterprise  also  hosts  Internet  servers  which  are  accessed  from  outside.  In  this  case, 
inbound  control  amounts  to  controlling  the  path  (or  the  ISP  link)  on  which  a  given  client  is  forced 
to  send  request  and  acknowledgment  packets  to  the  Web  server.  This  is  not  easy  since  predicting 
client  arrivals  and  forcing  them  to  use  the  appropriate  server  address  is  generally  not  possible. 

However,  techniques  based  on  DNS  or  deploying  multiple  versions  of  Web  pages  can  help  to 
achieve  inbound  control  for  externally  initiated  connection.  For  example,  the  enterprise  can  use  a 
different  version  of  a  base  Web  page  for  each  ISP  link.  The  hyperlinks  for  embedded  objects  in 
the  page  could  be  constructed  with  IP  addresses  corresponding  to  a  particular  ISP.  Arriving  clients 
would  be  given  the  appropriate  base  HTML  page  such  that  subsequent  requests  for  the  embedded 
objects  arrive  via  the  selected  ISP.  On  the  other  hand,  the  essential  function  of  the  DNS-based  tech¬ 
nique  is  to  provide  the  address  of  the  “appropriate”  interface  for  each  arriving  client.  A  preliminary 
study  of  the  effectiveness  of  this  approach  is  discussed  in  Section  6.4. 

Our  focus  in  the  rest  of  the  paper,  however,  is  on  the  case  of  enterprise-initiated  connections. 

6.2  Implementation  Details 

We  implement  the  above  multihoming  route  control  functions  over  a  simple  open  source  Web  proxy 
called  TinyProxy  [112].  TinyProxy  is  a  transparent,  non-caching  forward  Web  proxy  that  manages 
the  performance  of  Web  requests  made  by  clients  in  moderate-sized,  multihomed  enterprises.  Be¬ 
low,  we  present  the  details  of  our  implementation  of  the  three  basic  multihoming  components  in 
TinyProxy.  For  the  sake  of  simplicity,  we  assume  that  the  proxy  is  deployed  by  a  multihomed 
end-network  with  three  ISP  links. 

'in  fact,  like  most  enterprise  route  control  products,  we  enforce  outbound  route  control  by  transmitting  packets  to  a 
destination  along  the  same  ISP  as  the  one  on  which  the  traffic  from  the  destination  is  received. 
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6.2.1  Performance  Monitoring  Algorithms 

We  implement  both  the  aetive  and  passive  measurement  meehanisms,  deseribed  in  Seetion  6.1.1, 
for  monitoring  the  performanee  of  upstream  ISP  links. 

6.2.1.1  Passive  Measurement 

The  passive  measurement  module  traeks  the  performanee  to  destinations  of  interest  by  sampling  ISP 
links  using  Web  requests  initiated  by  elients  in  the  enterprise.  The  basie  idea  is  to  use  new  requests 
to  sample  an  ISP’s  performanee  to  a  given  destination,  if  the  performanee  estimate  for  that  ISP  is 
older  than  the  predefi  ned  sampling  interval.  If  the  module  has  eurrent  performanee  estimates  for  all 
links,  then  the  eonneetion  is  direeted  over  the  best  link  for  the  destination.  The  module  maintains  a 
performanee  hash  table  keyed  by  the  destination  (i.e.,  either  the  IP  address  or  the  domain  name  of 
the  destination).  A  hash  table  entry  holds  the  eurrent  estimates  of  the  performanee  to  the  destination 
via  the  ISPs,  along  with  an  assoeiated  timestamp  indieating  the  last  time  instant  when  performanee 
to  the  destination  via  an  ISP  was  measured.  This  is  neeessary  for  updating  the  EWMA  estimate  of 
historieal  performanee. 

Notiee  that  without  some  explieit  eontrol,  the  hash  table  maintains  performanee  samples  to  all 
destinations,  ineluding  those  rarely  aeeessed.  This  may  eause  a  high  overhead  of  measurement,  with 
eonneetions  to  less  popular  destinations  being  all  used  up  for  obtaining  performanee  samples.  While 
maintaining  explieit  TTLs  per  entry  might  help  flush  out  destinations  that  have  not  been  aeeessed 
over  a  long  period  of  time,  it  does  not  guarantee  a  manageable  measurement  overhead.  Also,  TTLs 
require  maintaining  a  separate  timer  per  entry,  whieh  is  an  additional  overhead. 

In  view  of  this,  we  limit  performanee  sampling  to  eonneetions  destined  for  the  most  popular 
sites,  where  popularity  is  measured  in  terms  of  aggregate  elient  request  eounts.  We  aehieve  this  in 
the  following  manner:  Eaeh  hash  entry  also  holds  the  number  of  accesses  made  to  the  eorresponding 
destination.  Upon  reeeiving  a  eonneetion  request  for  a  given  destination,  we  update  the  aeeess  eount 
for  the  destination  using  an  exponentially  weighted  moving  average  (EWMA).  The  EWMA  weight 
is  ehosen  so  that  the  aeeess  eount  for  the  destination  is  reset  to  ~  1  if  it  was  not  aeeessed  for  a  long 
time,  say  1  hour. 

We  use  a  hard  threshold  and  monitor  performanee  to  destinations  for  whieh  the  total  number  of 
requests  exeeeds  the  threshold.  We  aehieve  this  by  looking  for  live  entries  in  the  table  where  the 
aeeess  eounts  exeeed  the  threshold.  In  a  naive  hash  table  implementation  for  traeking  the  frequeney 
eounts  of  the  various  elements,  identifying  the  popular  destinations  in  this  manner  may  take  0(hash 
table  size)  time. 

Other  ways  of  traeking  top  destinations  sueh  as  leeberg  Queries  [38]  or  Sample-and-hold  [34], 
may  not  ineur  sueh  an  overhead  (see  [20]  for  a  survey  of  related  approaehes).  Nevertheless,  we 
stiek  with  our  approaeh  for  its  simplieity  of  implementation.  Also,  as  we  will  show  later,  the  over¬ 
head  from  looking  for  the  popular  hash  entries  in  our  implementation  is  negligible.  Note  that  this 
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Figure  6.2:  Monitoring  ISP  performance:  The  passive  measurement  scheme. 

approach  does  not  necessarily  limit  the  actual  number  of  popular  destinations,  for  example  in  the 
relatively  unlikely  case  that  a  very  large  number  of  destinations  are  accessed  very  often. 

Figure  6.2  shows  the  basic  operation  of  the  passive  monitoring  scheme.  When  an  enterprise 
client  initiates  a  connection,  the  scheme  fi  rst  checks  if  the  destination  has  a  corresponding  entry  in 
the  performance  hash  table  (i.e.,  it  is  labeled  popular).  If  not,  the  connection  is  simply  relayed  using 
an  ISP  link  chosen  in  a  random  load-balancing  fashion. 

If  there  is  an  entry  for  the  destination,  the  passive  scheme  scans  the  measurement  timestamps  for 
the  three  ISPs  to  see  if  the  elapsed  time  since  the  last  measurement  on  any  of  the  links  exceeds  the 
predefi  ned  sampling  interval.  If  so,  the  current  connection  is  used  to  sample  the  destination  along 
one  such  ISP  links. 

In  order  to  obtain  a  measurement  sample  on  an  ISP  link,  the  passive  approach  initiates  a  con¬ 
nection  to  the  destination  using  a  source  IP  address  set  such  that  the  response  will  return  via  the  link 
being  sampled.  Then,  it  measures  the  tum-around  time  for  the  connection,  i.e.,  the  time  between 
the  transmission  of  the  last  byte  of  the  client  HTTP  request,  and  the  receipt  of  the  fi  rst  byte  of  the 
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Figure  6.3:  Monitoring  ISP  performance:  The  SlidingWindow  active  measurement  scheme. 

HTTP  response  from  the  destination.  The  passive  scheme  uses  the  observed  turn-around  time  as 
the  performance  sample  to  the  destination,  and  the  corresponding  entry  in  the  hash  table  is  updated 
using  the  EWMA  method  (Section  6.1.2).  The  remainder  of  the  Web  request  proceeds  normally, 
with  the  proxy  relaying  the  data  appropriately. 

If  all  ISP  links  have  current  measurements  (i.e.,  within  the  sampling  interval),  the  proxy  initiates 
a  connection  using  the  best  link  for  the  destination  by  setting  the  source  IP  address  appropriately. 
This  action  is  discussed  further  in  Section  6.2.3. 

6.2.1.2  Active  Measurement 

Similar  to  passive  measurement,  the  active  measurement  scheme  also  maintains  a  hash  table  of  the 
performance  estimates  to  candidate  destinations  over  the  three  ISPs.  For  active  measurement,  we 
use  two  techniques  to  identify  which  destinations  should  be  monitored. 

Frequency  Counts.  Just  like  the  passive  measurement  mechanism,  in  this  scheme  we  track  the 
number  of  client  requests  directed  to  each  destination.  Every  T  seconds  (the  sampling  interval), 
we  initiate  active  probes  to  those  destinations  for  which  the  number  of  requests  exceeds  a  fi  xed 
threshold. 

SlidingWindow.  This  scheme  maintains  a  window  of  size  C  that  contains  the  C  most  recently 
accessed  destinations  (see  Figure  6.3).  The  window  is  implemented  as  a  fi  xed  size  FIFO  queue, 
in  which  destinations  from  newly  initiated  connections  are  inserted.  If  this  causes  the  number  of 
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elements  to  exeeed  C,  then  the  oldest  in  the  window  is  removed.  Every  T  seeonds  (the  sampling  in¬ 
terval),  an  aetive  measurement  thread  seans  the  window  and  ehooses  m%  of  the  elements  at  random. 
After  disearding  duplieate  destinations  from  this  subset,  the  aetive-measurement  seheme  measures 
the  performanee  to  the  remaining  destinations  along  the  ISPs. 

The  two  aetive  measurements  sehemes  have  their  respeetive  advantages  and  disadvantages.  Both 
the  sehemes  effeetively  sample  the  performanee  to  destinations  that  are  aeeessed  more  often  relative 
to  others.  However,  there  are  a  few  key  differenees.  First,  FrequencyCounts  is  deterministie  sinee 
it  works  with  a  reasonably  preeise  set  of  the  popular  destinations.  SlidingWindow,  on  the  other 
hand,  may  either  miss  a  few  popular  destinations,  or  sample  a  few  unpopular  destinations.  Seeond, 
FrequencyCounts  in  its  simplest  form,  eannot  easily  traek  short-term  shifts  in  the  popularity  of 
the  destinations.  These  new,  temporarily-popular  destinations  may  not  reeeive  enough  requests  to 
exeeed  the  threshold  and  foree  performanee  sampling  for  them.  SlidingWindow,  on  the  other  hand, 
ean  effeetively  traek  small  shifts  in  the  underlying  popularity  distribution  of  the  destinations. 

Probe  operation.  Onee  a  destination  is  seleeted  for  aetive  probing,  the  aetive  measurement  seheme 
sends  three  probes,  with  different  souree  IP  addresses,  eorresponding  to  the  three  ISPs,  and  waits 
for  the  destination  to  respond.  Sinee  we  found  that  a  large  fraetion  of  popular  Web  sites  fi  Iter  ICMP 
ECHO_REQUEST  paekets,  we  employ  a  TCP-based  probing  meehanism.  Speeifi  eally,  we  send  a 
TCP  SYN  paeket  with  the  ACK  bit  set  to  port  80  and  wait  for  an  RST  paeket  from  the  destination. 
We  use  the  elapsed  time  as  a  sample  of  the  turn-around  time  performanee.  We  found  that  most  sites 
respond  promptly  to  the  SYN+ACK  paekets. 

When  a  response  is  reeeived,  we  update  the  performanee  estimates  to  the  destination  for  the 
eorresponding  ISP,  along  with  the  measurement  timestamp.  As  deseribed  above,  we  update  the 
performanee  estimate  using  an  EWMA.  If  no  response  is  reeeived  from  a  destination  (whieh  has  an 
entry  in  the  performanee  hash  table),  then  a  large  positive  value  is  used  as  the  eurrent  measurement 
sample  of  the  performanee. 

6.2.2  Switching  ISPs 

After  updating  all  ISP  entries  for  a  destination  in  the  performanee  hash,  we  switeh  to  a  new  ISP  only 
if  it  offers  at  least  a  10%  better  RTT  performanee  over  the  eurrent  best  ISP.  Sinee  the  hash  entries 
are  updated  at  most  onee  every  T  seeonds  (both  passive  or  aetive  measurement  sehemes),  the  ehoiee 
of  best  ISP  per  destination  also  ehanges  at  the  same  frequeney. 

6.2.3  NAT-based  Inbound  Route  Control 

Our  inbound  route  eontrol  meehanism  is  based  on  manipulating  NAT  tables  at  the  Web  proxy  to 
refleet  the  eurrent  ehoiee  of  best  ISP.  We  use  the  iptables  paeket  fi  Itering  faeility  in  the  Einux  2.4 
kernel  to  install  and  update  NAT  tables  at  the  proxy.  The  NAT  rules  assoeiate  destination  addresses 
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with  the  best  ISP  link  sueh  that  the  souree  address  on  paekets  direeted  to  a  destination  in  the  table 
are  translated  to  an  address  that  is  announeed  to  the  ehosen  ISP. 

For  example,  suppose  ISP  I  is  seleeted  for  transfers  involving  destination  1.2. 3.4  and  the  ad¬ 
dresses  10.1.1.1  was  announeed  over  the  link  to  ISP  1.  Then  we  insert  a  NAT  rule  for  the  destination 
1.2. 3.4  that  (1)  matehes  paekets  with  a  souree  IP  of  defaultIP  and  destination  1. 2.3.4,  and  (2) 
translates  the  souree  IP  address  on  sueh  paekets  to  10.1.1.1. 

Notiee  that  if  the  NAT  rule  blindly  translates  the  souree  IP  on  all  paekets  destined  for  1. 2.3.4 
to  10.1.1.1,  then  it  will  not  be  possible  to  measure  the  performanee  to  1.2. 3.4  via  ISP  2,  assuming 
that  a  different  IP  address,  e.g.,  10.1.1.2,  was  announeed  over  the  link  to  ISP  2.  This  is  beeause  the 
NAT  translates  the  souree  address  used  for  probing  1.2. 3.4  aeross  ISP  2  (i.e.,  10.1.1.2)  to  10.1.1.1, 
sinee  ISP  1  is  eonsidered  to  be  the  best  for  destination  1.2. 3.4.  To  get  around  this  problem  in 
our  implementation,  we  simply  eonstruet  the  NAT  rule  to  only  translate  paekets  with  a  speeifi  e 
souree  IP  address  (in  this  ease  defaultIP).  Measurement  paekets  that  belong  to  probes  (aetive 
measurement)  or  elient  eonneetions  (passive  measurement)  are  sent  with  the  appropriate  souree 
address,  eorresponding  to  the  ISP  being  measured. 

6.3  Experimental  Evaluation 

In  this  section,  we  describe  our  experimental  evaluation  of  the  design  alternatives  proposed  in  Sec¬ 
tion  6.2.  These  include  the  performance  of  passive  versus  active  monitoring  schemes,  sensitivity  to 
various  measurement  sampling  intervals,  and  the  overhead  of  managing  performance  information 
for  a  large  set  of  destinations.  We  focus  on  understanding  the  benefi  ts  each  scheme  offers,  including 
the  set  of  parameters  that  result  in  the  maximum  advantage. 

6.3.1  Experimental  Set-up 

We  fi  rst  describe  our  testbed  setup  and  discuss  how  we  emulate  realistic  wide-area  network  delays. 
Then  we  discuss  key  characteristics  of  the  delay  traces  we  employ  in  our  emulation.  Finally,  we 
discuss  the  performance  metrics  we  use  to  compare  the  proposed  schemes. 

6.3.1.1  Testbed  topology 

We  use  the  simple  testbed  topology  shown  in  Figure  6.4(b).  Our  goal  is  to  emulate  a  moderately- 
sized  enterprise  with  three  ISP  connections  and  a  client  population  of  about  100  (shown  in  Fig¬ 
ure  6.4(a)). 

Node  S  in  the  topology  runs  a  simple  lightweight  Web  server  and  has  one  network  interface 
configured  with  100  different  IP  aliases — 10.1.1.1  through  10.1.1.100.  Each  alias  represents  an 
instance  of  a  Web  server — 10.1.1.1  being  the  most  popular  and  10.1.1.100  being  the  least  popular. 
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Figure  6.4:  Testbed  topology:  The  simple  test-bed,  shown  in  (b),  is  used  to  emulate  the  route  control  scenario 
shown  in  (a). 


Node  C  runs  100  instances  of  clients  in  parallel.  These  clients  make  requests  to  the  Web  sites 
10.1.1.1  through  10.1.1.100  as  follows.  The  inter-arrival  times  between  requests  from  a  single  client 
are  Poisson-distributed  with  a  mean  of  A  seconds.  Notice  that  this  mean  inter-arrival  rate  translates 
into  an  average  request  rate  of  ^  requests  per  second  at  the  server  S.  Each  client  request  is  for 
the  destination  where  i  is  sampled  from  the  set  {10.1.1.1,  . . .,  10.1.1.100}  according  to  a  Zipf 
distribution  with  an  exponent  ss  2.  In  our  evaluation,  we  set  the  parameters  of  the  monitoring 
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schemes  (passive  and  active)  so  that  the  average  rank  of  the  destinations  probed  is  20,  meaning  that 
we  explicitly  track  the  top  40  most  popular  sites  during  each  experiment.  The  object  sizes  requested 
by  the  client  are  drawn  from  a  Pareto  distribution  with  an  exponent  of  2  and  a  mean  size  of  5KB. 

Node  P  in  the  topology  runs  TinyProxy.  It  is  conti  gured  with  one  “internal”  interface  on  which 
the  proxy  listens  for  connections  from  clients  within  the  emulated  enterprise.  It  has  another  inter¬ 
face  with  three  IP  aliases,  10.1.3.1,  10.1.3.2  and  10.1.3.3,  representing  addresses  announced  over 
the  three  ISP  links.  Node  is  a  delay  element,  running  WaspNet  [77],  a  loadable  kernel  mod¬ 
ule  providing  emulation  of  wide-area  network  characteristics  on  the  Linux  platform.  We  modify 
WaspNet  to  enforce  packet  delays  (along  with  drops,  and  bandwidth  limits)  on  a  per-<source  IP, 
destination  IP>  pair  basis.  We  also  modify  it  to  support  trace-based  network  delay  emulation  as 
illustrated  in  Figure  6.4(b). 

In  order  to  recreate  realistic  network  delays  between  the  clients  and  the  servers  in  the  testbed,  we 
collect  a  set  of  wide  area  delay  measurements  using  the  Akamai  content  distribution  network.  We 
pick  three  Akamai  server  machines  in  Chicago,  each  attached  to  a  unique  ISP.  We  then  run  pings  at 
regular  intervals  of  10s  from  each  of  them  to  100  other  Akamai  servers  located  in  various  US  cities 
and  attached  to  a  variety  of  ISPs.  The  measurements  were  taken  over  a  one-day  period  on  Dec  7th, 
2003. 

In  this  measurement,  the  three  Akamai  machines  in  Chicago  collectively  act  as  a  stand-in  for 
a  multihomed  network  with  three  ISP  connections.  The  100  Akamai  servers  probed  represent  des¬ 
tinations  contacted  by  end-nodes  in  the  multihomed  network.  We  use  the  series  of  delay  samples 
between  the  three  Akamai  sources  and  the  100  destination  servers  as  inputs  to  the  WaspNet  module 
to  emulate  delays  across  the  ISP  links. 

6.3.1.2  Compressing  time 

It  is  quite  time-consuming  to  emulate  the  entire  day’s  worth  of  delays,  multiple  times  over,  to  test 
and  tune  the  parameters  in  the  proposed  scheme.  One  work-around  could  be  to  choose  a  smaller 
portion  of  the  delay  traces  (e.g.,  2  hours).  However,  a  quick  analysis  of  the  delay  traces  we  collected 
shows  that  there  is  little  variation  in  the  delays  along  the  probed  paths  on  a  2-hour  timescale.  Since 
our  goal  is  to  understand  how  effective  our  proposals  are  over  a  wide  range  of  operating  conditions, 
it  is  important  to  test  how  well  the  schemes  handle  frequent  changes  in  the  performance  of  the 
underlying  network  paths.  With  this  in  mind,  we  compress  the  24-hour  delay  traces  by  a  factor  of 
10,  to  2-hour  delay  traces  and  use  these  as  the  real  inputs  to  the  WaspNet  delay  module.  In  these 
2-hour  traces,  performance  changes  in  the  underlying  paths  occur  roughly  10  times  more  often  when 
compared  to  the  full  24-hour  trace.  The  characteristics  of  the  2-hour  delay  traces  collected  from  the 
nodes  in  Chicago  are  shown  in  Table  6.1,  column  2.  We  use  these  delay  traces  in  our  emulation. 

We  also  want  to  ensure  that  the  delays  observed  from  the  Chicago  source  nodes  were  not  signif¬ 
icantly  different  from  typical  delays  experienced  by  a  well-connected,  multihomed  network  located 
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Chicago 

trace 

NYC 

trace 

LA 

trace 

Mean  time  between 

performanee  ehanges 

79s 

101s 

105s 

Standard  deviation  of 

time  between  ehanges 

337s 

487s 

423s 

Mean  extent  of 

performanee  ehange 

±33% 

±28% 

±34% 

Standard  deviation  of 

extent  of  ehange 

±26% 

±22% 

±27% 

Mean  time  between 

performanee  ehanges  of  30% 

298s 

261s 

245s 

Table  6.1:  Characteristics  of  the  delay  traces:  Here  “performance”  refers  to  the  delay  along  a  given  path, 
in  a  major  U.S.  metropolitan  area.  Henee,  we  eolleet  similar  traees  from  souree  nodes  loeated  in  two 
other  eities,  namely  New  York  and  Los  Angeles.  These  traees  were  eolleeted  on  Mareh  20th,  2004. 
The  statisties  for  these  latter  traees  are  shown  in  eolumns  2  and  3  of  Table  6.1.  These  statisties  show 
that  the  Chieago-based  traees  we  use  in  our  experiments  have  roughly  the  same  eharaeteristies  as 
those  eolleeted  at  the  other  eities. 


6.3.1.3  Comparison  Metric 

To  evaluate  the  benefi  t  from  using  our  proposed  route  eontrol  sehemes  we  eompare  the  response 
time  of  transfers  when  using  a  partieular  seheme  (i.e.,  Respt^x, scheme)^  for  ^  transfer  x),  with  the 
response  time  when  the  best  of  the  three  ISPs  is  employed  for  any  transfer 


scheme 


scheme) 

mini{Resp(^^jSPi)} 


(6.1) 


Where,  ||a:||  is  the  total  number  of  transfers.  In  this  ehapter,  we  eall  TZ  the  “performanee  metrie”  or 
the  “normalized  response  time”  for  the  route  eontrol  meehanism.  The  eloser  7?.  is  to  1,  the  better  the 
performanee  of  the  seheme. 

In  the  above  eomputation,  the  response  times  from  employing  the  best  ISP  any  time  (the  terms 
in  the  denominator  above)  are  eomputed  in  an  offline  manner,  by  foreing  the  transfers  to  use  all  the 
three  ISPs,  and  seleeting  the  ISP  offering  the  best  response  time. 


6.3.2  Experimental  Results 

We  perform  our  experiments  on  the  Emulab  [33]  testbed.  We  use  600MHz  Pentium  III  maehines 
with  256MB  RAM,  running  Red  Hat  7.3.  We  fi  rst  deseribe  how  we  seleet  different  elient  workloads 
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Figure  6.5:  Web  server  load  profile:  Average  response  time  in  ms,  per  KB  of  the  request,  as  a  function  of  the 
average  client  arrival  rate  at  the  server  in  onr  topology  (Figure  6.4(b)). 

in  our  evaluation,  and  then  move  on  to  the  evaluation  of  our  route  eontrol  strategies. 

6.3.2.1  Selecting  the  Client  Workloads 

In  Figure  6.5  we  show  the  average  response  time  per  KB  of  elient  requests  (i.e.,  the  eompletion 
time  for  a  request  divided  by  the  size  of  the  request  in  KB),  as  a  funetion  of  the  average  arrival 
rate  of  elients  at  the  server  S  (i.e.,  ^  requests/s).  The  response  time  quiekly  degrades  beyond  an 
arrival  rate  of  about  15  requests/s.  After  this  point,  it  inereases  only  marginally  with  the  request 
rate.  We  seleet  fi  ve  different  points  on  this  load  eurve  (highlighted),  eorresponding  to  arrival  rates 
of  1.7,  3.3,  10,  13.3  and  20  requests/s  ,  and  evaluate  the  proposed  sehemes  under  these  workloads. 
These  workloads  represent  various  stress  levels  on  the  server  S,  while  also  ensuring  that  it  is  not 
overloaded.  The  high  variability  in  response  times  in  overload  regimes  may  impaet  the  aeeuraey  of 
our  eomparison  of  the  proposed  sehemes. 

In  the  remainder  of  this  seetion,  we  foeus  on  addressing  the  following  questions: 

1.  To  what  extent  do  the  route  eontrol  sehemes  improve  the  performanee  of  the  multihomed  site, 
relative  to  using  the  single  best  ISP  alone? 

2.  Does  employing  historieal  samples  help  in  better  estimating  future  ISP  performanee? 

3.  How  do  aetive  and  passive  measurement  sehemes  eompare  in  terms  of  the  performanee  im¬ 
provement  they  offer?  Whieh  of  the  two  aetive  measurement  sehemes — SlidingWindow  or 
FrequencyCounts — works  better? 

4.  At  what  time  intervals  should  samples  for  ISP  performanee  be  eolleeted? 

5.  What  overheads  do  the  proposed  meehanisms  ineur? 

The  aggregate  performanee  improvement  from  the  passive  measurement-based  sehemes  is  shown 
in  Figure  6.6.  Here,  we  set  the  EWMA  parameter  a  =  0  so  that  only  the  eurrent  measurement  sam¬ 
ples  are  used  to  estimate  ISP  performanee,  and  seleet  a  sampling  interval  of  30s.  The  fi  gure  plots 
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Figure  6.6:  Performance  improvement:  The  performance  metric  IZ  for  the  passive  measurement  scheme  with 
EWMA  parameter  a  =  0  (no  history  employed)  and  sampling  interval  of  30s.  The  graph  also  shows  the  perfor¬ 
mance  from  the  three  individual  ISPs. 
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Figure  6.7:  Unrolling  the  averages:  Ratio  and  the  difference  in  the  response  times  from  using  just  ISP  3  for  all 
transfers  relative  to  using  the  passive  measurement  scheme.  The  average  client  arrival  rate  in  either  case  is  13.3 
requests/s. 


the  performance  for  the  fi  ve  client  workloads.  In  addition,  we  show  the  performance  from  using  the 
three  ISPs  individually. 

The  performance  improvement  relative  to  the  best  single  ISP  is  signiti  cant  -  about  20-25%  for 
the  heavy  workloads  (right  end  of  the  graph)  and  about  10-15%  for  the  light  workloads  (left  end 
of  the  graph).  The  performance  is  still  about  15-20%  away  from  the  optimal  value  of  1,  however. 
The  results  for  other  sampling  intervals  (60s,  120s,  300s  and  450s)  are  similar,  and  are  omitted  for 
brevity.  The  performance  improvements  from  using  the  active  measurement-based  schemes  are  also 
similar  and  are  discussed  later. 


6.3.2.2  Improvements  from  Route  Control 

Figures  6.7(a)  and  (b)  illustrate  the  distribution  of  the  response  time  improvements  offered  by  the 
passive  measurement  scheme  (for  a  =  0  and  sampling  interval  =  30s)  relative  to  being  singly-home 
to  the  best  ISP  from  Figure  6.6,  i.e.,  ISP  3.  Figure  (a)  plots  the  CDF  of  the  ratio  of  the  response  time 
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Figure  6.8:  Route  control  at  work:  The  ISPs  chosen  hy  the  passive  measurement-hased  route  control  scheme  for 
destinations  with  different  popnlarity  levels. 

from  using  ISP  3  to  the  response  time  from  the  passive  measurement  seheme  aeross  all  transfers. 
These  results  are  for  the  speeifi  e  instanee  where  the  elient  arrival  rate  is  13.3  requests/s  at  the  server. 
Figure  (b)  similarly  plots  the  differenee  in  the  response  times  for  the  same  elient  workload. 

Notiee,  from  either  fi  gure,  that  the  passive  measurement  seheme  improves  the  response  time 
performanee  for  over  65%  of  the  transfers.  Figure  6.7(a)  shows  that  this  route  eontrol  seheme 
improves  the  response  times  by  faetors  as  large  as  5  for  a  small  fraetion  of  transfers  (about  1%), 
relative  to  being  singly-homed.  Similarly,  Figure  6.7(b)  shows  that  the  seheme  ean  improve  response 
times  by  more  than  Is  for  some  transfers.  Notiee  also  that  the  passive  measurement-based  seheme 
ends  up  offering  sub-optimal  performanee  for  about  35%  of  the  transfers. 

Figure  6.8  illustrates  the  operation  of  the  passive  measurement-based  seheme.  In  this  figure, 
we  show  the  ISPs  used  over  time  for  transfers  to  three  different  destinations — a  popular  destination 
(10.1.1.4),  a  moderately  popular  destination  (10.1.1.16),  and  a  less  popular  destination  (10.1.1.38). 
Reeall  that  the  passive  measurement-based  seheme  explieitly  traeks  the  performanee  to  the  40  most 
popular  destinations.  The  sampling  interval  is  30s  and  the  elient  arrival  rate  is  about  13.3  requests/s. 

From  this  fi  gure,  we  nofe  fhaf  ehanges  fo  fhe  roufe  for  fhe  popular  desfinafion  are  made  every 
160s  on  an  average.  For  fhe  moderafe  and  less  popular  desfinafions,  fhe  intervals  are  300s  and  550s 
respeefively.  For  fhe  passive  seheme,  fhe  number  of  route  ehanges  depends  on  fhe  popularity  of 
fhe  desfinafions — fhe  more  popular  a  desfinafion  is,  fhe  higher  fhe  frequeney  of  ifs  roufe  ehanges. 
Figure  6.8  also  shows  fhe  optimal  ehoiee  of  ISPs  for  fhe  popular  desfinafion  as  a  funefion  of  fime, 
as  determined  from  fhe  underlying  delay  fraees.  Comparing  fhis  wifh  fhe  ISPs  aefually  seleefed  by 
fhe  seheme,  we  sometimes  see  eases  where  our  seheme  makes  a  sub-opfimal  ehoiee  (e.g.,  befween 
750-800s,  around  1500s,  and  2250-2450s). 

6.3.2.3  Employing  History  to  Estimate  Performance 

Figure  6.9  plots  the  performanee  of  the  passive  measurement  seheme  for  three  different  values  of  the 
parameter  a.  These  eorrespond  to  assigning  80%,  50%  and  20%  weight  to  the  eurrent  measurement 
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Figure  6.9:  Impact  of  history:  The  performance  achieved  hy  relying  on  historical  samples  to  varying  degrees. 
These  results  are  for  the  passive  measurement-hased  strategy  with  a  sampling  interval  of  30s. 


Figure  6.10:  Active  vs  passive  measurement:  The  performance  of  the  two  active  measurement-hased  schemes,  and 
the  passive  measurement  scheme  for  a  sampling  interval  of  120s. 

sample  and  the  remaining  weight  to  the  past  samples.  Although  we  only  show  results  for  a  sampling 
interval  of  30s,  the  performanee  from  other  interval  sizes  are  similar.  The  figure  also  plots  the 
performanee  when  no  history  is  employed  (a  =  0)  and  the  performanee  from  using  ISP  3  alone. 
Notiee  that  the  performanee  from  employing  history  is  uniformly  inferior  in  all  situations,  relative 
to  employing  no  history.  In  faet,  historieal  samples  only  serve  to  bring  performanee  elose  to  that 
from  using  the  single  best  ISP.  These  results  show  that  the  best  way  to  estimate  ISP  performanee  is 
to  just  use  the  eurrent  performanee  sample  as  an  estimate  of  near-term  performanee. 

6.3.2.4  Active  vs  Passive  Measurement 

In  Figure  6.10,  we  eompare  the  performanee  from  the  two  aetive  measurement  based  teehniques 
(i.e.,  SlidingWindow  and  FrequencyCounts)  with  the  passive  measurement  approaeh.  Sinee  our 
earlier  results  showed  that  history  does  not  help  in  improving  performanee,  heneeforth  we  present 
results  in  whieh  no  history  is  employed.  We  show  the  performanee  of  the  three  sehemes  sehemes 
for  a  sampling  interval  of  120s.  Note  that  the  two  aetive  measurement  sehemes  offer  eomparable 
performanee.  However,  the  workloads  we  seleeted  do  not  bring  out  other  underlying  trade-offs 
of  these  sehemes.  Figure  6.10  also  shows  that  the  aetive  measurement-based  sehemes  offer  better 
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(a)  Passive  measurement  (b)  FrequencyCounts 

Figure  6.11:  Impact  of  the  sampling  interval:  The  performance  from  using  different  sampling  intervals  from 
passive  measnrement-hased  and  the  FrequencyCounts  active  measurement-based  schemes. 

performanee  than  the  passive  measurement  seheme:  moderate  improvements  of  about  8-10%  for 
the  light  workloads  and  slight  improvement  of  2-3%  for  the  heavier  workloads.  This  is  expeeted, 
sinee  the  passive  seheme  uses  existing  transfers  to  obtain  samples  of  performanee  aeross  potentially 
sub-optimal  ISP  links. 

6.3.2.5  Frequency  of  Performance  Monitoring 

Figure  6.11  shows  the  impaet  of  the  measurement  frequeney  on  the  aggregate  performanee  for 
the  passive  measurement  seheme  (Figure  6.11(a))  and  the  FrequencyCounts  aetive  measurement 
seheme  (Figure  6.11(b)).  From  Figure  6.11(a)  we  notiee  that  longer  sampling  intervals  surprisingly 
offer  slightly  better  performanee  for  passive  measurement.  To  understand  this  better,  eonsider  the 
eurve  for  the  elient  arrival  rate  of  10  requests/s.  A  elient  arrival  rate  of  10  requests/s  implies  that 
an  average  of  lOT  eonneetions  are  made  by  the  elients  every  T  seeonds,  where  T  is  the  sampling 
interval.  However,  in  order  to  obtain  samples  for  a  fraetion  /  of  the  100  destinations  over  the 
three  ISPs,  the  passive  measurement  seheme  will  have  to  foree  300/  eonneetions  aeross  the  ISP 
links.  This  leaves  a  fraetion  1  —  ^  whieh  are  not  employed  for  measurement,  and  eould  be  routed 
along  the  optimal  ISP,  assuming  that  the  passive  measurement  yields  reasonably  aeeurate  estimate 
of  performanee^ .  As  T  inereases,  the  fraetion  of  eonneetions  routed  over  the  optimal  path  is  likely 
to  inerease,  resulting  in  a  marginal  improvement  in  performanee.  This  explains  the  slight  downward 
trend  in  Figure  6.11(a). 

At  the  same  time,  infrequent  sampling  (i.e.,  large  values  of  T)  ean  have  a  negative  impaet  on 
the  overall  performanee.  This  ean  be  seen  in.  Figure  6.11(b)  whieh  plots  the  performanee  from  the 
FrequencyCounts  seheme  as  a  funetion  of  the  sampling  interval.  A  sampling  interval  of  450s  suffers 
a  5-8%  performanee  penalty  relative  to  smaller  intervals  (e.g.,  60s).  In  the  ease  of  FrequencyCounts, 
very  aggressive  sampling  (e.g,  an  interval  of  30s)  eould  sometimes  have  a  negative  impaet  on  the 

^About  a  third  of  the  connections  employed  for  measurement  can  be  expected  to  be  routed  along  their  optimal  ISPs 
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Passive 

Active 

FreqCount 

Active 

SlidingWin 

Total 

performance 

penalty 

18% 

14% 

17% 

Penalty  from 

inaccurate 

estimation  only 

16% 

12% 

14% 

Penalty  from 

measurement 

and  NAT  only 

2% 

2% 

3% 

Table  6.2:  Analysis  of  performance  overheads:  Here  “penalty”  is  deli  ned  as  the  value  of  72^  —  1  in  each  case, 
overall  performance  due  to  increased  probing  overheads  at  the  proxy. 

6.3.2.6  Analysis  of  overheads 

As  the  performance  results  show,  both  passive  and  active  measurement  are  still  about  10%  away 
from  the  optimal  performance  (specifi  cally,  the  active  measurement-based  approaches).  Three  key 
factors  contribute  to  this  gap:  (1)  the  accuracy  of  measurement  techniques,  and  correspondingly,  the 
accuracy  of  ISP  choices,  (2)  overhead  of  conducting  measurement,  and  (3)  software  overhead  from 
making  frequent  updates  to  the  NAT  table^  and  enforcing  NAT  rules  on  most  outgoing  packets. 
In  this  section,  we  analyze  the  contribution  of  these  factors  on  the  performance  of  the  proposed 
schemes. 

To  quantify  the  overhead  of  our  implementation,  we  compare  the  performance  due  to  the  choices 
made  by  the  route  control  proxy,  with  the  performance  when  the  best  ISP  choices  are  made  in 
an  offline  manner  for  each  connection.  Recall  that  in  order  to  compute  the  performance  metric 
TZ,  we  evaluated  the  response  time  of  each  ISP  for  every  transfer  offline  so  that  the  best  ISP  link 
for  each  connection  was  known,  independent  of  the  route  control  mechanisms  (the  terms  in  the 
denominator  in  Equation  6.1).  If  we  combine  these  offline  response  time  values  with  the  decisions 
made  by  the  proxy,  we  can  estimate  the  performance  penalty  due  to  incorrect  choices,  independent 
of  the  software  overheads  (i.e.,  #2  and  #3  above).  The  difference  between  the  resulting  performance 
metric,  TZ,  and  1  gives  us  the  performance  penalty,  not  including  overheads  of  the  implementation. 

We  show  the  penalties  from  the  three  proposed  schemes,  obtained  as  stated  above,  in  Table  6.2, 
row  2.  The  client  arrival  rate  is  13.3  requests/s  and  the  sampling  rate  is  30s.  In  this  table,  the  numbers 

^We  could  allow  routes  to  change  less  frequently  than  the  sampling  interval,  T,  (e.g.,  every  T'  >  T  seconds)  but  since 
we  do  not  use  performance  history,  this  would  be  equivalent  to  sampling  and  updating  every  T'  seconds. 
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in  row  1  show  the  actual  performance  penalties  suffered  by  the  schemes  in  our  implementation, 
taking  all  overheads  into  account  (from  Figure  6.11(a)  and  (b)).  Notice  that  a  large  portion  of  the 
overall  penalty  is  contributed  by  the  inaccuracies  in  measurement  and  ISP  selection  (rows  1  and 
2  are  almost  identical).  Measurement  and  software  overheads  themselves  result  in  a  performance 
penalty  of  2-3%  (difference  between  rows  1  and  2,  shown  in  row  3).  As  we  mentioned  earlier,  active 
measurement-based  techniques  offer  better  overall  performance.  Also,  the  results  above  show  that 
the  measurement  and  route  control  overhead  in  active  measurement  schemes  is  small.  This  suggests 
that  commercial  route  control  products  should  prefer  employing  active  measurement-based  route 
control  schemes  to  passive  schemes. 

6.4  Additional  Design  and  Operational  Issues 

The  route  control  mechanisms  we  presented  here  are  a  fi  rst  attempt  at  understanding  how  to  extract 
good  performance  from  multiple  ISP  connections  in  practice.  There  are  clearly  a  number  of  ways  in 
which  they  can  be  improved.  Also,  we  do  not  address  several  important  issues,  such  as  ISP  costs  and 
joint  optimization  of  performance  and  reliability.  Below,  we  briefly  discuss  some  of  these  potential 
improvements  and  issues. 

Handling  lost  probes.  In  our  implementation  of  the  active  probing  schemes,  we  send  just  one 
probe  when  collecting  a  performance  sample  for  a  (ISP  link,  destination)  pair.  It  is  possible  that 
lost  probes,  e.g.,  due  to  transient  congestion  or  even  timeouts,  may  be  misinterpreted  for  poor  per¬ 
formance  of  the  ISP  path  to  the  destination.  This  can  cause  unwanted  changes  in  the  ISP  choice 
for  the  destination.  We  can  prevent  this  by  sending  a  short  burst  of,  say,  three  probes  per  (ISP  link, 
destination)  pair.  The  performance  reported  by  all  three  probes  can  then  be  used  to  estimate  the 
quality  of  the  ISP  link,  perhaps  with  a  weighting  to  account  for  any  observed  losses. 

Hybrid  passive  and  active  measurements.  The  accuracy  of  passive  measurement  can  be  improved 
by  sending  active  probes  immediately  after  a  failed  passive  probe,  for  example  when  the  observed 
connection  ends  unexpectedly.  This  increases  confi  dence  that  the  failed  connection  is  due  to  a 
problem  with  the  ISP  link,  as  opposed  to  a  transient  effect. 

In  our  implementation,  paths  to  less  popular  destinations  are  not  explicitly  monitored  (in  both 
active  and  passive  schemes).  As  a  result,  we  may  have  to  rely  on  passive  observations  of  transfers  to 
unpopular  destination  to  ensure  quick  fail-over.  For  example,  whenever  the  proxy  observes  several 
failures  on  connections  to  an  unpopular  destination,  it  can  immediately  switch  the  destination’s 
default  ISP  to  one  of  the  remaining  ISPs  for  future  transfers. 

Balancing  performance  and  resilience.  A  key  function  of  most  route  control  products  is  to  respond 
quickly  to  ISP  failures.  One  of  our  fi  ndings  in  this  study  is  that  a  relatively  long  sampling  interval 
provides  signifi  cant  performance  improvements.  However,  a  long  interval  can  slows  down  the  end- 
network’s  reaction  to  path  failures.  Smaller  sampling  intervals  can  offer  better  resilience.  For 
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example,  a  sampling  interval  of  60s  with  aetive  measurement  works  well  in  sueh  eases,  providing 
reasonably  low  overhead  and  good  performanee  (Figure  6.11(b)),  while  ensuring  a  failover  time  of 
about  one  minute. 

ISP  pricing  structures.  In  our  study,  we  ignore  issues  relating  to  the  the  eost  of  the  ISP  links. 
Different  ISP  eonneetions  may  have  very  different  prieing  polieies.  One  may  eharge  a  flat  priee  up 
to  some  eommitted  rate,  while  another  may  use  purely  usage-based  prieing  or  eharge  differently  de¬ 
pending  on  whether  the  destination  is  “on-net”  or  “off-net.”  A  more  formal  and  thorough  diseussion 
of  teehniques  for  optimizing  ISP  usage  eosts  as  well  as  performanee  may  be  found  in  [45].  While 
we  do  not  explieitly  eonsider  how  to  optimize  overall  bandwidth  eosts,  we  nevertheless  believe  that 
our  evaluation  of  aetive  and  passive  monitoring,  and  the  utility  of  history,  are  eentral  to  more  general 
schemes  that  optimize  cost  and  performance  together  (such  as  the  algorithms  in  [45]). 

Loug-lived  TCP  flows.  In  our  route  control  schemes,  an  update  to  a  NAT  entry  for  a  destination  in 
the  midst  of  an  ongoing  transfer  involving  that  destination  could  cause  the  transfer  to  fail  (due  to  the 
change  in  source  IP  address).  We  did  not  observe  many  failed  connections  in  our  experiments,  and 
most  of  the  flows  were  very  short.  However,  this  effect  is  nevertheless  likely  to  have  a  pronounced 
impact  on  the  performance  of  long-lived  flows.  It  is  possible  to  address  this  problem  by  delaying 
updates  to  NAT  table  until  after  ongoing  large  transfers  complete.  However,  this  increases  the 
complexity  of  the  implementation,  as  it  involves  identifying  flow  lengths,  and  checking  for  the 
existence  of  other  long-lived  flows  at  the  time  of  update.  It  may  also  force  other  short  flows  to  the 
same  destination  to  traverse  sub-optimal  ISPs  while  the  NAT  update  is  delayed. 

Impact  on  routing  table  sizes.  Announcing  small,  non-aggregateable  address  sub-blocks  to  differ¬ 
ent  upstream  ISPs  (Section  6.1.3)  could  affect  the  size  of  routing  tables  in  the  core  of  the  network. 
This  problem  can  be  overcome  if  multihomed  end-networks  obtain  provider-assigned  addresses: 
instead  of  buying  a  single  individual  IP  address  block,  an  end-network  simply  acquires  equal-sized 
independent  address  blocks  from  each  of  its  ISPs.  These  address  blocks  could  then  be  further  ag¬ 
gregated  by  the  ISPs. 

Global  effects  of  route  control.  Another  important  issue  is  the  potential  impact  of  the  interactions 
between  many  enterprises  deploying  route  control  mechanisms.  This  will  likely  have  an  effect  not 
only  on  the  marginal  benefi  ts  of  the  route  control  solutions  themselves,  but  also  on  the  network  as 
a  whole.  However,  a  recent  simulation-based  study  of  this  problem  by  Qiu  et  al.  in  [45]  has  shown 
that  the  the  impact  of  multiple  end-networks  employing  route  control  on  any  single  multihoming 
user  is  very  minimal,  at  the  equilibrium  of  the  interactions.  Similarly,  the  impact,  at  equilibrium,  on 
singly-homed  users  is  also  negligible.  While  these  are  positive  observations,  the  issues  of  whether 
end-networks  can  reach  an  equilibrium,  and,  how  stable  the  equilibrium  is,  still  remain  open. 

About  externally-initiated  connections.  Our  implementation  primarily  considered  handling  con¬ 
nections  initiated  from  within  the  enterprise,  as  these  are  common  for  current  enterprise  applications 
(e.g.,  to  contact  content  providers).  A  route  control  product  must  also  handle  connections  from  out- 
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side  clients  to  enable  optimized  access  to  servers  hosted  in  the  enterprise  network.  Recently,  several 
route  control  device  vendors  have  introduced  features  to  use  Domain  Name  System  (DNS)  reso¬ 
lution  requests  as  a  means  to  direct  inbound  client  traffi  c  over  the  desired  link.  Next,  we  describe 
preliminary  measurements  regarding  the  usefulness  DNS-based  approaches  for  externally-initiated 
connections.  A  more  detailed  description  may  be  found  in  [85]. 

6.4.1  DNS  for  Inbound  Route  Control 

The  destination  IP  address  used  by  a  remote  client  determines  which  ISP  link  is  used  for  the  con¬ 
nection  request.  Hence,  by  responding  with  the  appropriate  IP  address  when  the  client  makes  a 
request  to  resolve  a  service  name  (e.g.,  www.service.com),  the  inbound  link  can  be  selected.  This  is 
the  key  idea  behind  DNS-based  route  control  and  is  very  similar  to  using  DNS  as  a  server  selection 
mechanism  in  content  distribution  networks  [102]. 

While  DNS  is  a  convenient  and  relatively  transparent  mechanism,  it  is  unclear  whether  it  can 
respond  quickly  enough  for  dynamic  route  control.  Responses  to  name  resolution  requests  have 
an  associated  time-to-live  (TTL)  value  that  determines  how  long  the  response  should  be  cached  by 
the  client’s  local  name  server.  Ideally,  by  setting  the  TTL  to  a  very  small  value  (e.g.,  10s  or  even 
zero),  it  is  possible  to  force  external  clients  to  resolve  the  IP  address  frequently,  thus  providing 
fast  responsiveness.  In  practice,  however,  this  is  complicated  by  the  behavior  of  the  wide  variety 
of  applications  and  DNS  servers  deployed  in  the  Internet.  Many  applications  perform  their  own 
internal  DNS  caching  that  does  not  adhere  to  the  expected  behavior,  and  some  older  implementations 
of  DNS  software  have  been  reported  to  ignore  low  TTL  values.  These  artifacts  make  it  diffi  cult  to 
predict  how  quickly  clients  will  respond  to  changes  communicated  via  DNS  responses. 

In  order  to  quantify  the  responsiveness  of  DNS  in  practice,  we  perform  a  simple  analysis  of 
client  behavior  in  response  to  DNS  changes  during  a  large  Web  event.  We  collect  logs  from  a  set 
of  Web  caches  that  served  requests  for  content  related  to  a  Summer  2003  sporting  event  with  global 
audience.  During  the  event,  when  the  request  rate  was  very  high,  the  authoritative  name  servers 
directed  all  clients  to  the  set  of  caches  with  a  lOmin  TTL.  After  the  event,  the  name  servers  are 
updated  to  direct  clients  to  lower  capacity  origin  servers.  Ideally,  all  traffi  c  to  the  caches  should 
subside  after  lOmin. 

Figure  6.12  shows  the  aggregate  request  volume  to  all  caches  over  time,  just  before  and  after 
the  DNS  change,  where  requests  were  gathered  into  1-minute  intervals.  During  the  one-hour  period 
after  the  DNS  change,  requests  came  from  about  13400  unique  client  IP  addresses  and  5600  unique 
IP  subnets.  The  number  of  subnets  is  computed  by  clustering  client  IP  addresses  using  BGP  tables 
obtained  from  [60,  114]. 

Figure  6.12(a)  shows  the  last  part  of  the  trace,  with  a  clear  peak  occurring  on  the  last  day  of  the 
event,  followed  by  a  period  of  relatively  constant  and  sustained  traffi  c,  and  fi  nally  a  sharp  drop-off 
corresponding  fo  fhe  fime  when  fhe  DNS  is  updafed.  Figure  6.12(b)  focuses  on  fhe  lime  around  Ihe 
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(a)  2-day  trace  (b)  2-hour  portion 

Figure  6.12:  DNS  responsiveness:  This  fi  gure  shows  traffi  c  volume  over  time  just  before  and  after  a  DNS  change. 
The  left  graph  (a)  shows  a  2-day  period  around  the  end  of  the  event,  while  (h)  focuses  on  a  2-hour  period  around 
the  time  of  the  DNS  update. 

DNS  update;  the  solid  line  denotes  the  time  of  the  update  and  the  dashed  line  is  the  time  when  the 
lOmin  TTL  expires.  Between  these  times,  the  request  volume  decreases  by  66%.  The  remaining 
third  of  the  traffi  c  decays  very  slowly  over  a  period  of  more  than  12  hours.  While  this  analysis  is  not 
defi  nitive,  it  does  suggest  that  DNS  is,  at  best,  a  coarse-grained  mechanism  for  controlling  traffi  c. 

6.5  On  Common  Route  Control  Practices 

Our  focus  in  fhis  chapfer  was  on  improving  Infernef  performance  of  enferprise  nefworks  via  roufe 
confrol.  Several  research  sfudies  and  producfs  have  considered  ofher  benefi  fs  of  mulfihoming  roufe 
confrol.  For  example,  in  [48,  94],  Guo  ef.  al  conducf  frace-driven  experimenfs  fo  evaluafe  several 
design  options  using  a  commercial  mulfihoming  device.  They  evaluafe  fhe  abilify  of  several  algo- 
rifhms  fo  balance  load  over  multiple  broadband-class  links  fo  provide  service  similar  fo  a  single 
higher-bandwidfh  link.  The  aufhors  fi  nd  fhaf  fhe  effectiveness  of  hash-based  link  selecfion  (i.e., 
hashing  on  packef  header  fi  elds)  in  balancing  load  is  comparable  fo  load-based  selecfion.  In  ad¬ 
dition,  fheir  resulfs  show  fhaf  managing  load  af  a  connection-level  granularify  is  only  slighfly  less 
effective  fhan  per-packef  load  balancing.  Andersen  ef  al.  similarly  consider  various  mechanisms  for 
improving  fhe  reliabilify  of  Web  access  for  DSL  clienfs  in  [15]. 

A  number  of  vendors  have  recenfly  developed  dedicafed  nefworking  appliances  [92,  35,  79]  or 
soflware  slacks  [107,  93]  for  optimizing  fhe  use  of  multihomed  conneclivily  in  enterprises  sellings 
where  BGP  is  nof  used.  Mosl  of  Ihese  producfs  use  lechniques  similar  fo  fhe  ones  we  evaluate  in  our 
sludy.  However,  fheir  focus  is  geared  more  toward  balancing  load  and  managing  bandwidlh  cosls 
across  mulliple  ISP  links,  ralher  fhan  optimizing  performance.  All  of  Ihese  use  NAT-based  confrol 
of  inbound  Iraffi  c  and  DNS  fo  influence  links  used  by  external  clienl-inilialed  connections.  They 
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also  ensure,  by  traeking  sessions  or  using  poliey-based  routing,  that  the  same  ISP  link  is  used  in 
both  directions. 

Another  class  of  products  and  services  are  targeted  at  settings  where  BGP  is  employed,  such 
as  large  data  centers  or  campuses  [98,  103].  These  products  mainly  focus  on  outbound  control  of 
routes  and,  as  such,  are  more  suited  for  content  providers  which  primarily  source  data.  Details 
of  the  algorithms  used  by  any  of  the  above  commercial  products  to  monitor  link  performance  or 
availability  are  generally  proprietary,  and  little  information  is  available  on  specific  mechanisms 
or  parameter  settings.  Here,  we  review  the  general  approaches  taken  in  enterprise  route  control 
products. 

Most  products  employ  both  ICMP  ping  and  TCP  active  probes  to  continuously  monitor  the 
health  of  ISP  links,  enabling  rapid  response  to  failure.  In  some  cases,  hybrid  passive  and  active 
monitoring  is  used  to  track  link  performance.  For  example,  when  a  connection  to  a  previously 
unseen  destination  is  initiated  from  an  enterprise  client,  active  probes  across  the  candidate  links 
sample  performance  to  the  destination.  Connections  to  known  destinations,  on  the  other  hand,  are 
monitored  passively  to  update  performance  samples.  Another  approach  is  to  use  active  probing  for 
monitoring  link  availability,  and  passive  monitoring  for  performance  sampling.  Some  products  also 
allow  static  rules  to  dictate  which  link  to  use  to  reach  known  destinations  networks. 

Finally,  some  products  and  recent  research  studies  [15]  have  suggested  using  “race” -based  ISP 
performance  measurements,  wherein  SYN  packets  sent  by  enterprise  clients  to  initiate  connections 
are  replicated  by  the  route  control  device  on  all  upstream  ISPs  (using  source  NAT).  The  link  on 
which  the  corresponding  S YN-ACK  arrives  from  the  server  fi  rst  is  used  for  the  remainder  of  the 
connection.  The  route  control  device  sends  RST  packets  along  the  slower  paths  so  that  the  server  can 
terminate  the  in-progress  connection  establishment  state.  The  choice  of  best  link  is  cached  for  some 
time  so  that  subsequent  connections  that  arrive  within  a  short  time  period  need  not  trigger  a  new 
race  unless  a  link  failure  is  detected.  It  is  easy  to  extend  the  active  and  passive  probing  techniques 
presented  in  this  chapter  to  incorporate  “race” -based  performance  optimization.  We  believe  that 
this  approach  will  further  enhance  the  performance  improvements  from  the  route  control  techniques 
discussed  in  this  dissertation. 


6.6  Summary  of  Observations  and  their  Implications 

Our  goal  in  this  chapter  was  to  experimentally  evaluate  a  variety  of  practical  mechanisms  and  poli¬ 
cies  for  realizing  performance  benefi  ts  from  ISP  multihoming.  We  focused  on  the  scenario  of  mul¬ 
tihomed  enterprises  that  wish  to  leverage  multiple  ISPs  to  improve  the  response  time  performance 
for  clients  downloading  content  from  Internet  Web  servers.  Using  a  real  Linux-based  route  control 
implementation  and  an  emulated  wide-area  network  testbed,  we  evaluated  several  design  alterna¬ 
tives.  These  included  the  performance  of  passive  versus  active  monitoring  schemes,  sensitivity  to 
various  measurement  sampling  intervals,  and  techniques  to  manage  performance  information  for  a 
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large  set  of  destinations. 

The  key  fi  ndings  from  our  evaluation  of  a  3-multihoming  seenario  are  summarized  in  Table  6.3. 
Our  evaluation  shows  that  both  aetive  and  passive  measurement-based  route  eontrol  sehemes  offer 
signifieant  performanee  benefits  in  praetiee,  between  15%  and  25%,  when  eompared  to  using  the 
single  best-performing  ISP.  The  performanee  from  these  sehemes  is  within  5-15%  of  the  optimal 
possible  benefi  ts.  For  light  to  moderate  elient  workloads  the  performanee  from  aetive  measurement 
sehemes  was  mueh  better.  Our  experiments  also  show  that  using  historieal  performanee  to  ehoose 
the  best  ISP  link  is  not  neeessary:  The  most  eurrent  measurement  sample  gives  a  good  estimate. 
We  showed  that  the  performanee  penalty  from  eolleeting  and  managing  performanee  data  aeross 
various  destinations  is  negligible. 

To  summarize,  the  key  eontribution  of  our  route  eontrol  implementation  is  to  shows  that  the 
benefi  fs  of  mulfihoming  roufe  eonfrol  ean  be  realized  in  praefiee  using  very  simple  meehanisms, 
e.g.,  aefive  and  passive  probing,  and  NAT.  In  effeel,  we  showed  fhaf  inereasing  fhe  Infernef  rouf- 
ing  flexibilily  of  end-nefworks  by  modesl  amounfs  using  purely  end  nefwork-based  meehanisms 
subsfanfially  improves  fheir  Infernef  performanee. 

So  far,  we  have  shown  how  mulfihoming  roufe  eonfrol  ean  effeefively  improve  fhe  Infernef 
performanee  of  endpoinfs  in  fhe  nefwork  foday.  However,  as  more  and  more  endpoinfs  employ 
higher-speed  aeeess  links,  and  as  fhe  Infernef  grows  in  ifs  size,  fhe  volume  of  Iraffi  e  as  well  as  fhe 
applieafion  mix  in  fhe  nefwork  will  ehange  drasfieally  from  whaf  we  observe  today.  An  imporfanf 
quesfion  fo  ask,  fhen,  is  whefher  feehniques  sueh  as  mulfihoming  roufe  eonfrol  ean  be  effeelive  in 
fhe  fulure  Infernef.  We  address  fhis  issue  in  fhe  nexf  ehapfer. 
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The  route  eontrol  sehemes  we  deseribe  ean  signifi  eantly  improve  elient  Web  response 
times  at  a  multihomed  site  by  up  to  25%  in  our  experiments. 

The  performanee  from  our  proposed  route  eontrol  meehanisms  is  about  10%  away  from 
the  optimal  benefi  ts,  on  average. 


Relying  on  historieal  samples  to  monitor  performanee  of  ISPs  (e.g.,  using  EWMA)  is 
not  very  useful,  and  sometimes  may  be  detrimental  to  performanee.  The  most  eurrent 
measurement  sample  is  a  very  good  estimator  of  near-term  performanee  of  an  ISP  link. 


Both  passive  and  aetive  measurement-based  sehemes  better  performanee  than  using 
single  ISP  eonneetions.  However,  the  latter  approaeh  offers  substantially  better  perfor¬ 
manee  for  light  to  moderate  elient  workloads. 


The  overhead  introdueed  by  aggressive  performanee  sampling  may  slightly  reduee  the 
overall  performanee  benefi  t  of  route  eontrol  sehemes.  A  sampling  interval  of  one 
minute,  on  the  other  hand,  seems  to  offer  good  performanee-overhead  trade-offs. 


The  overhead  from  measurements  (in  aetive  measurement  sehemes)  and  frequent  up¬ 
dates  to  the  NAT  table  are  negligible.  Most  of  the  performanee  penalties  arise  from  the 
inaeeuraeies  of  the  measurement  and  estimation  teehniques. 


DNS  is  not  an  effeetive  meehanism  to  aehieve  inbound  eontrol  of  requests  for  externally 
initiated  eonneetions. 

Table  6.3:  Practical  Multihoming  Route  Control:  Summary  of  key  observations  regarding  route  control  imple¬ 
mentation. 
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Chapter  7 


Scaling  of  Congestion  in  the  Internet 


The  Internet  grows  in  size  every  day.  As  time  progresses,  more  end-hosts  are  added  to  the  edge  of 
the  network.  Correspondingly,  to  aeeommodate  these  new  end-hosts,  ISPs  add  more  routers  and 
links.  History  has  shown  that  the  addition  of  these  links  maintains  eertain  maeroseopie  properties 
of  the  Internet.  For  example,  the  intereonneetion  between  ISPs  in  the  Internet  eontinues  to  follow  a 
power  law  graph  strueture  [37].  Also,  the  addition  of  new  end-hosts  over  time  plaees  a  greater  load 
on  the  network  as  a  whole.  Fortunately,  the  improvement  of  network  teehnology  operates  over  the 
same  time  period.  We  expeet  the  network  links  at  the  edge  and  eore  of  the  network  to  improve  by 
a  similar  performanee  faetor,  sinee  they  both  typically  follow  similar  Moore’s  Law-like  technology 
trends. 

Unfortunately,  due  to  the  topology  of  the  network  and  behavior  of  Internet  routing,  the  increase 
in  load  may  be  different  on  different  links  of  the  network.  As  a  result,  it  may  be  necessary  for  the 
speed  of  some  key  “centrally  located”  hot-spot  links  in  the  network  to  improve  much  more  quickly 
than  others.  If  this  is  true,  then  these  key  parts  of  the  network  will  eventually  emerge  into  persistent 
bottlenecks.  Under  these  circumstances,  routing-based  mechanisms  can  no  longer  be  employed  by 
end-networks  to  improve  their  performance.  Consider  route  control,  for  example.  Multihoming 
paths  will  still  have  to  traverse  links  in  tier-1  ISPs  (due  to  the  global  reach  of  the  ISPs).  Similarly, 
for  most  placements  of  overlay  nodes,  even  overlay  paths  will  traverse  tier-1  networks  (although 
the  likelihood  of  this  happening  is  lower  compared  to  route  control).  Therefore,  if  links  belonging 
to  such  backbone  carriers  emerge  as  hot-spots,  neither  approach  can  help  end-networks  overcome 
these  inevitable  hot-spots.  We  can  then  say  that  the  network  has  poor  scaling  properties.  In  such 
a  situation,  we  would  either  need  to  drastically  adjust  the  Internet’s  routing  behavior  or  change  the 
structure  of  the  network  (for  example,  the  interconnections  between  ISPs)  to  ensure  good  future 
performance. 

On  the  other  hand,  if  the  worst  congestion  scales  well  with  the  network  size  then  we  can  expect 
the  network  to  continue  to  operate  as  it  does  now.  Also,  routing-based  mechanisms  will  continue 
to  be  effective  at  improving  endpoint  performance.  In  this  chapter,  we  perform  a  preliminary  study 
of  the  scaling  properties  of  the  Internet.  Using  reasonably  realistic  theoretical  models  of  network 
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evolution  and  inter-domain  routing,  we  seek  to  answer  the  following  question: 


How  does  the  maximum  eongestion  in  the  Internet  seale  with  the  network  size? 

Our  analysis  foeuses  on  the  Internet  AS-level  intereonneetion.  We  eonsider  a  model  of  network 
evolution  based  on  Preferential  Conneetivity  [21],  and  a  simple  model  of  traffie  in  whieh  a  unit 
amount  of  flow  between  every  pair  of  nodes  is  routed  along  the  shortest  path  between  them.  We 
employ  simple  eombinatorial/probabilistie  arguments  to  give  bounds  on  the  maximum  eongestion 
in  the  AS-level  graph.  We  also  eonduet  simulations  of  the  eongestion  on  the  links  in  the  network, 
based  both  on  real  and  on  synthetieally  generated  AS-level  topologies  and  synthetie  traffi  e  matriees. 
Through  our  simulations,  we  also  investigate  the  impaet  of  several  key  faetors  on  the  worst  eonges¬ 
tion  in  the  network,  sueh  as  variants  of  the  inter-domain  routing  algorithm,  alternate  traffi  e  matriees, 
and  fi  nally,  alternate  topologies  for  the  AS-level  intereonneetion. 

The  key  result  in  this  ehapter  is  that  the  maximum  eongestion  in  Internet-like  graphs  seales 
poorly  with  the  growing  size  of  the  graph.  Speeifi  eally,  the  maximum  eongestion  for  shortest  path 
routing  and  uniform  traffie  matriees  is  worse  than  with  the  exponent  depending  on  the 

degree  of  “skew”  in  the  power  law  degree  distribution  of  the  graph  ^ .  Our  simulations  show  that 
poliey  routing  in  the  AS  graph  results  in  roughly  the  same  maximum  eongestion  as  shortest  path 
routing,  but  eertainly  not  worse.  When  alternate,  non-uniform  traffi  e  models  are  eonsidered,  the 
eongestion  sealing  properties  of  power  law  graphs  worsen  substantially.  We  also  show  that  in  terms 
of  the  maximum  eongestion,  power  law  trees  are  eonsiderably  worse  than  power  law  graphs.  In 
eontrast,  graphs  with  exponential  degree  distribution  have  very  good  eongestion  properties. 

Further,  we  also  diseuss  simple  guidelines  to  ehange  the  ISP-level  intereonneetions  that  result 
in  a  dramatie  improvement  in  the  eongestion  sealing  properties  of  Internet-like  graphs.  We  show 
that  when  parallel  links  are  added  between  adjaeent  nodes  (i.e.,  ASes)  in  the  network  aeeording  to 
simple  funetions  of  their  degrees  or  the  number  of  neighboring  ASes  (e.g.,  the  minimum  of  the  two 
degrees),  the  maximum  eongestion  in  the  resulting  graph  seales  linearly.  This  heuristie  for  adding 
edges  refleets  the  desired  amount  of  peering  between  neighboring  ASes  in  the  Internet  in  order  to 
guarantee  good  overall  eongestion  sealing  properties. 

Chapter  outline.  In  Seetion  7.1,  we  formalize  our  analytieal  approaeh  and  diseuss  our  simulation 
set-up.  The  analysis  is  presented  in  Seetion  7.2.  Seetion  7.3  presents  the  results  from  our  simu¬ 
lations.  In  Seetion  7.4,  we  discuss  the  implications  of  our  results  on  network  design  and  present 
mechanisms  to  alleviate  the  poor  congestion  scaling  of  the  Internet  graph.  In  Section  7.5,  we  survey 
past  work  on  modeling  the  Internet  and  results  for  deriving  congestion  scaling  properties  of  general 

'There  is  some  disagreement  about  whether  a  power  law  correctly  models  the  degree  distribution  of  the  Internet  graph. 
However,  it  is  widely  agreed  that  the  distribution  is  heavy-tailed.  While  our  main  results  (specifically,  simulation  results) 
focus  on  power  law  distributions,  we  believe  that  they  hold  equally  well  for  other  such  heavy-tailed  distributions  (e.g. 
Weibull). 
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graphs.  Finally,  in  Section  7.6,  we  summarize  the  key  observations  in  this  chapter  and  present  a  few 
caveats  of  our  approach. 


7.1  Methodology 

We  fi  rst  give  a  precise  formulation  of  the  problem,  laying  out  the  key  questions  we  seek  to  address 
via  analysis.  We  also  describe  the  simulation  set-up  for  corroborating  and  extending  our  analytical 
arguments. 

7.1.1  Problem  Statement 

Let  G  =  {V,E)  be  an  unweighted  graph,  representing  the  Internet  AS-level  graph,  with  |F|  =  n. 
Let  dy  denote  the  total  degree  of  a  vertex  v  in  G.  We  are  given  three  key  aspects  pertaining  to  the 
graph  G:  the  degree  distribution  of  the  graph,  the  routing  algorithm  used  by  the  nodes  in  the  graph 
to  communicate  with  each  other  and  the  traffi  c  demand  matrix  determining  the  amount  of  traffi  c 
between  pairs  of  nodes  in  the  graphs.  We  give  precise  defi  nitions  of  these  three  aspects,  in  turn, 
below. 

We  will  mostly  be  concerned  with  graphs  having  a  power  law  degree  distribution,  defi  ned  be¬ 
low^. 

Definition  1  We  say  that  an  unweighted  graph  G  has  a  power  law  degree  distribution  with  exponent 
a,  if  for  all  integers  d,  the  number  of  nodes  v  with  dy  >  d  is  proportional  to  d““. 

Similarly,  graphs  with  exponential  degree  distribution  are  those  in  which  the  number  of  nodes  v 
with  dy  >  dis  proportional  to  e~^^,  for  all  integers  d.  Henceforth,  we  will  refer  to  such  graphs  as 
power  law  graphs  and  exponential  graphs  respectively. 

Let  S  denote  a  routing  scheme  on  the  graph  with  Su,v  representing  the  path  for  routing  traffi  c 
befween  nodes  u  and  v.  We  consider  fwo  differenf  routing  schemes: 

1.  Shortest  Path  Routing:  In  this  scheme,  the  route  between  nodes  u  and  v  is  given  by  the 
shortest  path  between  the  two  nodes  in  the  graph  G.  This  reflects  route  selection  in  BGP  where 
every  AS  prefers  paths  with  the  least  number  of  ASes.  When  there  are  multiple  shortest  paths, 
we  consider  the  maximum  degree  of  nodes  along  the  paths  and  pick  the  one  with  the  highest 
maximum  degree.  This  tie-breaking  rule  is  reflective  of  the  typical  policies  employed  in  the 
Internet — higher  degree  nodes  are  typically  much  larger  and  much  more  well-provisioned 
providers  than  lower  degree  nodes  and  are  in  general  used  as  the  primary  connection  by 

^There  is  some  disagreement  about  whether  a  power  law  correctly  models  the  degree  distribution  of  the  AS-level 
graph.  However,  it  is  widely  agreed  that  the  distribution  is  heavy-tailed.  While  our  main  results  (specifically,  simulation 
results)  focus  on  power  law  distributions,  we  believe  that  they  hold  equally  well  for  other  such  heavy-tailed  distributions 
(e.g.  Weibull). 


127 


stub  networks.  In  Section  7.3.3,  we  consider  alternate  tie-breaking  schemes  such  as  random 
choice  and  favoring  lower  degree  nodes,  and  show  that  the  tie-breaking  rule  does  not  affect 
our  results. 

2.  Policy  Routing:  In  this  scheme,  traffi  c  between  nodes  u  and  v  is  routed  according  to  BGP- 
policy.  We  classify  edges  as  peering  edges  or  customer-provider  edges  (that  is,  one  of  the 
ASes  is  a  provider  of  the  other).  These  commercial  relations  between  ISPs  are  known  to  give 
rise  to  “valley-free”  routing,  in  which  each  path  contains  a  sequence  of  customer  to  provider 
edges,  followed  by  at  most  one  peering  edge,  followed  by  provider  to  customer  edges.  The  key 
difference  between  policy  routing  and  shortest  path  routing,  then,  is  that  in  the  former  case, 
we  are  restricted  to  choosing  between  multiple  shortest  path  routes  which  are  all  compliant 
with  valley-free  routing.  Our  goal  in  studying  these  two  forms  of  routing  is  to  understand  if 
commercial  policies  between  ISPs  signifi  cantly  change  our  observations  regarding  network 
congestion. 

A  traffi  c  vector  r  is  a  vector  containing  (”)  non-negative  terms,  with  the  term  corresponding  to 
{u,v)  signifying  the  amount  of  traffi  c  befween  the  nodes  u  and  v.  The  congestion  on  an  edge  e  due 
to  traffi  c  vector  r  and  roufing  scheme  S  is  given  by  the  sum  of  the  total  amount  of  traffi  c  thaf  uses 
fhe  edge  e:  Cr,s{e)  =  E(u,vy.eeSu,v  t{u,v). 

We  defi  ne  the  edge  congestion  due  to  traffi  c  vector  r  and  routing  scheme  <S  to  be  the  maximum 
congestion  on  any  edge  in  the  graph: 

EDGE-C0NGESTI0Nt5(G)  =  max  Ct  5(e) 

’  e^E  ’ 

Our  goal  is  to  quantify  the  congestion  in  a  graph  with  power  law  degree  distribution,  for  shortest  path 
and  policy  routing  schemes,  due  to  different  traffi  c  vectors.  Specifi  cally,  we  consider  the  following 
three  traffi  c  vectors: 

1 .  Any-2-any:  This  corresponds  to  the  all  Is  traffi  c  vector,  with  a  unit  traffi  c  befween  every  pair 
of  ASes.  While  very  simplistic,  this  model  is  amenable  to  analysis  and  provides  a  baseline 
for  comparison  against  more  complex  traffi  c  vectors. 

2.  Leaf-2-leaf:  In  order  to  defi  ne  this  model,  we  classify  nodes  in  the  graph  as  stubs  and 
carriers.  Stubs  are  nodes  that  do  not  have  any  customers.  In  other  words,  consider  directing 
all  customer-provider  edges  in  the  graph  from  the  customer  to  the  provider.  Peering  edges  are 
considered  to  be  bi-directed  edges.  Then,  vertices  with  no  incoming  edges  (corresponding  to 
ASes  with  no  customers)  are  called  stubs  or  leaves  in  the  graph.  In  this  model,  there  is  a  unit 
of  traffi  c  befween  every  pair  of  sfubs  in  the  graph.  No  other  AS  sources  or  sinks  traffi  c  on  its 
own. 

3.  Clout:  This  model  approximates  the  fact  that  “well-placed”  sources,  that  is,  sources  which 
have  a  high  degree  and  are  connected  to  high  degree  neighbors,  are  likely  to  transmit  larger 
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amounts  of  traffic  than  other  sources.  Accordingly,  in  this  case,  t{u,v)  =  f{dmCu),  where 
u  and  V  are  stubs,  c„  is  the  average  degree  of  the  neighbors  of  u  and  /  is  an  increasing 
function.  As  in  the  previous  case,  there  is  no  traffi  c  between  nodes  that  are  not  stubs.  In  what 
follows,  we  only  use  the  function  T(tt,  v)  =  f{du,  Cu)  =  duCu  for  stubs  u,  v.  This  is  the  most 
sophisticated  of  the  traffi  c  vectors  we  consider. 

Admiffedly,  our  choice  of  fhe  models  for  Infernef  roufing,  as  well  fhose  for  Infernef  fraffi  c  ma- 
frices,  are  somewhaf  unrealistic.  We  do  fry  fo  model  real  phenomena,  such  as  popularity  of  some 
ASes,  buf  make  no  claims  fo  fhe  realism  of  our  models.  However,  we  sfill  use  fhem  in  our  analysis 
for  reasons  of  simplicity  and  for  lack  of  realistic  Infernef-wide  fraffi  c  fraces. 

7.1.2  Simulation  Set-up 

Our  simulations  serve  fwo  purposes:  (1)  to  corroborate  our  fheorefical  results,  and,  (2)  to  character¬ 
ize  the  congestion  in  more  realistic  network  models  than  those  considered  in  our  analysis. 

Our  simulations  are  run  on  two  different  sets  of  graphs.  The  ti  rst  set  of  graphs  contains  maps  of 
the  Internet  AS  topology  collected  at  6  month  intervals  between  Nov.  1997  and  April  2002,  available 
at  [78].  The  number  of  nodes  in  any  graph  in  this  set  is  at  most  13000,  the  maximum  corresponding 
to  the  April  2002  set.  The  second  set  of  graphs  contains  synthetic  power  law  graphs  generated  by 
Inet-3.0  [115].  In  this  set,  we  generate  graphs  of  sizes  varying  from  n  =  4000  to  50000  nodes 

As  mentioned  earlier,  in  order  to  implement  the  leaf-2-leaf  and  clout  models  of  communica¬ 
tion,  we  need  to  identify  stubs  in  the  network  (note  that  these  might  have  a  degree  greater  than  1). 
Additionally,  in  order  to  implement  policy  routing,  we  need  to  classify  edges  as  peering  or  non¬ 
peering  edges.  To  do  this  for  the  real  AS  graphs,  we  employ  the  relationship  inference  algorithms  of 
Gao  [42]  to  label  the  edges  of  the  graphs  as  peering  or  customer-provider  edges.  These  algorithms 
use  global  BGP  tables  [23]  to  infer  relationships  between  nodes.  Then,  we  use  these  relationships 
to  identify  stubs,  as  nodes  that  are  not  providers  of  any  other  node.  Henceforth,  we  shall  refer  to  the 
real  AS  graphs  as  accurately  labeled  real  graphs  (ALRs). 

Labeling  edges  and  identifying  stubs  in  the  synthetic  graphs  of  Inet  is  more  tricky  since  we 
do  not  have  the  corresponding  BGP  information.  We  will  refer  to  synthetic  graphs,  labeled  using 
the  algorithms  described  below,  as  heuristically  labeled  synthetic  graphs  (HLSs).  We  use  different 
algorithms  for  classifying  nodes  (this  is  key  to  implementing  leaf-to-leaf  communication)  and  edges 
(this  is  key  to  implementing  policy  routing  in  synthetic  graphs).  We  discuss  these  next. 

Stub  identification.  We  identify  stubs  in  synthetic  graphs  as  follows:  For  any  edge  e  =  (ui,U2), 
we  assign  vi  to  be  the  provider  of  V2  whenever  degree{vi)  >  degree{v2)-  Notice  that  we  do 
not  explicitly  identify  peering  edges  (although  edges  between  nodes  of  identical  degree  will  be 
bidirectional).  We  then  identify  stubs  in  graphs  labeled  as  above. 

^In  all  our  simulations,  for  any  metric  of  interest,  for  each  n,  we  generate  5  different  graphs  of  n  nodes  (hy  varying 
the  random  seed  used  by  the  Inet  graph  generator)  and  report  the  average  of  the  metric  on  the  5  graphs. 
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(a)  Accuracy  of  Stub  Identifi  cation  (b)  Accuracy  of  Edge  Labeling 

Figure  7.1:  Accuracy  of  heuristics:  The  graph  on  the  left  shows  the  accuracy  of  our  simple  stub  identification 
algorithm.  The  graph  on  the  right  shows  the  error  in  the  maximum  congestion  dne  to  our  machine-learning  based 
edge-classifi  cation  algorithm. 

We  test  the  accuracy  of  this  stub-identifi  cation  algorithm  on  real  AS  graphs  by  comparing  the 
labels  produced  by  our  algorithm  to  the  true  labels  of  ALRs,  and  compute  the  fraction  of  false 
positives  and  false  negatives^  in  these.  The  accuracy  results  are  shown  in  Figure  7.1.  Note  that  our 
simple  algorithm  has  a  very  low  error  rate.  Notice  that  the  inference  algorithms  of  Gao  [42]  have 
some  error  intrinsically  and  hence  some  of  the  labels  on  the  ALRs  might  actually  be  inaccurate. 
Edge  classification.  Simply  considering  all  edges  in  the  graph  to  be  customer-provider  edges,  as 
done  above,  is  not  useful  for  the  purposes  of  edge  classifi  cation.  Specifi  cally,  it  results  in  a  sig- 
nifi  cant  error  on  the  maximum  congestion  in  real  graphs  employing  policy  routing  (results  omitted 
for  brevity).  In  order  to  improve  the  accuracy  of  labeling  edges,  we  resort  to  machine  learning  al¬ 
gorithms.  However,  coming  up  with  a  good  machine  learning  algorithm  for  the  classifi  cation  is  a 
challenging  task,  because  there  is  a  huge  bias  toward  customer-provider  edges  in  the  graphs  (roughly 
95%  of  the  edges  are  customer-provider  edges).  We  use  the  3-Nearest  Neighbor  [75]  algorithm  for 
classifying  edges  as  peering  or  non-peering:  each  edge  in  the  unlabeled  graph  is  classifi  ed  as  a  peer¬ 
ing  edge  if  among  fhe  fhree  edges  mosf  similar  fo  if  in  fhe  labeled  graph,  af  leasf  fwo  are  peering 
edges.  Similarity  befween  edges  is  judged  based  on  fhe  degrees  of  fheir  respective  end  poinfs  and 
neighboring  vertices.  We  measure  fhe  accuracy  of  fhe  procedure  by  applying  if  fo  real  graphs  and 
fhen  comparing  fhe  classifi  cafion  wifh  fme  labels. 

Our  machine  learning  algorifhm  gives  only  20%  accuracy  on  peering  edges  and  abouf  95% 
accuracy  on  customer-provider  edges.  However,  for  fhe  purposes  of  computing  fhe  worsf  congesfion 
in  fhe  graph,  fhis  low  accuracy  of  labeling  is,  in  fad,  enough.  Indeed,  as  shown  in  Figure  7.1(b), 
labeling  real  graphs  using  our  algorifhm  resulfs  in  an  error  of  less  fhan  10%  in  fhe  worsf  congesfion 

"'False  positives  are  nodes  that  are  identified  as  stubs  by  the  algorithm,  but  are  not  stubs  in  the  ALR.  False  negatives 
are  stubs  in  the  ALR  that  are  not  identifies  as  stubs  by  the  algorithm. 
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(while  employing  poliey  routing)  in  eomparison  with  the  eongestion  eomputed  on  ALRs.  More 
importantly,  the  trends  in  eongestion  growth  are  identieal  in  the  two  eases. 

Other  topologies.  In  addition  to  power  law  graphs,  we  also  study  eongestion  in  power  law  trees  and 
exponential  topologies.  A  eomparison  of  the  former  with  power  law  graphs  gives  an  insight  into  the 
signifi  eanee  of  density  of  edges  in  the  graph.  The  latter  model  is  interesting  beeause  most  generative 
models  for  power  law  topologies  result  in  exponential  distributions  in  the  “fringe”  eases  [36].  Our 
power  law  tree  topologies  evolve  aeeording  to  the  Preferential  Conneetivity  model  [21].  To  generate 
exponential  topologies,  we  modify  Inet-3.0  to  generate  an  exponential  degree  distribution  first  and 
then  add  edges  in  Inet’s  usual  way.  For  a  given  n,  the  exponent  (5  for  the  exponential  graphs  on  n 
nodes  is  chosen  such  that  the  total  number  of  edges  in  the  exponential  graph  is  very  close  to  that  of 
the  corresponding  power  law  graph  on  n  nodes^.  Note  that  due  to  a  lack  of  real  data  for  exponential 
graphs,  we  do  not  have  a  good  way  of  labeling  edges  and  nodes  in  them.  Therefore,  we  do  not 
perform  experiments  with  policy  routing  or  the  leaf-2-leaf  and  clout  traffi  c  models  for  them. 

7.2  Analytical  Results 

In  this  section,  we  show  that  the  expected  maximum  edge  congestion  in  a  power  law  graph,  specif¬ 
ically  the  congestion  on  the  edge  between  the  two  highest  degree  nodes,  grows  as  0(n^'’“a)  with 
n,  when  we  route  a  unit  flow  between  all  pairs  of  vertices  over  the  shortest  path  between  them.  We 
consider  the  Preferential  Connectivity  Generative  Model  of  Barabasi  et  al.  [21].  This  model  uses 
a  fi  xed  constant  parameter  k.  The  model  starts  with  a  complete  graph  on  A:  +  1  nodes.  This  set  of 
nodes  is  called  the  core  of  the  graph.  The  graphs  grows  in  time-steps.  Let  the  graph  at  time  i  be 
denoted  G*.  At  time  step  i  +  1,  one  node  is  added  to  the  network.  This  node  picks  k  nodes  at  random 
from  G*  and  connects  to  them.  Each  vertex  v  has  a  probability  ^  of  getting  picked,  where  d\  is  the 
degree  of  the  vertex  at  time  i,  and  is  the  total  degree  of  all  nodes  at  time  i.  At  the  end  of  n  steps, 
with  A:  =  3,  this  process  is  known  to  generate  a  power  law  degree  distribution.  Also,  it  is  easy  to  see 
that  in  the  resulting  power  law  graph  (for  that  matter  in  any  power  law  graph),  the  maximum  node 
degree  is 

In  order  to  show  a  lower  bound  on  the  congestion  of  a  power  law  graph,  our  plan  is  roughly  as 
follows.  We  consider  the  edge  between  the  two  highest  degree  nodes  in  the  core — si  and  S2-  Call 
this  edge  e*.  For  every  vertex  v  in  the  graph,  we  consider  the  shortest  path  tree  rooted  at  vertex 
V.  We  show  that  in  expectation,  f2(n)  such  trees  contain  the  edge  e*.  Moreover,  in  these  trees,  the 
expected  number  of  nodes  in  the  subtree  rooted  at  edge  e*  is  at  least  f2(na  ).  This  gives  us  the  lower 
bound  in  the  following  way:  the  routes  taken  by  each  connection  are  precisely  those  defi  ned  by  the 
above  shortest  path  trees;  thus  the  congestion  on  any  edge  is  the  sum  of  congestions  on  the  edge  in 
these  shortest  path  trees.  Now,  as  described  above,  in  f2(n)  shortest  path  trees,  the  congestion  on 

^We  employ  heuristic  hill-climbing  to  estimate  the  value  of  the  exponent  j3  that  minimizes  error  in  the  number  of 
edges. 
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edge  e*  is  at  least  Q,{na  ).  Therefore,  the  total  congestion  on  edge  e*  is  at  least  ).  Note  that 

e*  is  not  necessarily  the  most  congested  edge  in  the  graph,  so  the  maximum  congestion  could  be 
even  worse  than  r2(n^“'“a  ).  We  get  the  following  theorem: 

Theorem  1  The  expected  value  of  the  maximum  edge  congestion  in  a  power  law  graph  with  expo¬ 
nent  a  grows  as  )  with  n,  when  we  route  a  unit  flow  between  all  pairs  of  vertices  over  the 

shortest  path  between  them. 


In  the  following,  the  distance  between  two  nodes  refers  to  the  number  of  hops  in  the  shortest 
path  between  the  two  nodes.  We  make  a  few  technical  assumptions.  We  assume  that  1  <  a  <  2,  and 
Si  and  S2  are  the  highest  degree  nodes  in  the  graph.  For  reasonably  “small”  numbers  h,  we  assume 
that  for  any  node  v  in  the  graph,  the  number  of  nodes  within  distance  /i  of  n  is  less  than  the  number 
of  nodes  within  distance  /i  of  si.  In  other  words,  si  is  centrally  placed  in  the  graph.  Here,  “small” 
refers  to  distance  around  si  that  contains  lesser  than  half  the  nodes.  These  assumptions  are  justifi  ed 
by  experimental  evidence  and  some  prior  analysis  [41]  of  the  preferential  connectivity  generative 
model. 

We  begin  with  a  technical  lemma. 


Lemma  1  Let  r  be  the  maximum  integer  for  which  at  least  ^  vertices  lie  at  a  distance  r  +  1  or 
beyond  from  si.  Then,  H(n)  nodes  lie  within  distance  r  —  1  of  every  node  in  the  core  of  the  graph. 
In  particular,  for  any  node  in  the  core,  Ll{n)  nodes  lie  at  a  distance  exactly  r  —  Ifrom  it. 


Proof:  We  prove  that  at  least  Ll{n)  nodes  lie  within  a  distance  r  —  2  of  si.  Then,  since  all  vertices 
in  the  core  are  neighbors  of  si,  these  Ll{n)  nodes  lie  within  a  distance  r  —  1  of  any  vertex  in  the 
core  of  the  graph.  We  begin  by  showing  that  at  least  Ll{n)  nodes  lie  within  a  distance  r  of  si,  and 
then  extend  this  to  nodes  at  distance  r  —  1  and  r  —  2.  Let  level  i  denote  the  set  of  nodes  at  distance 
exactly  i  from  si. 

Remove  from  the  graph  all  vertices  that  are  at  level  r  +  2  or  higher.  The  remaining  graph  has  at 
least  ^  vertices,  by  the  defi  nition  of  r.  Now,  assume  that  there  are  at  least  ^  vertices  at  level  r  +  1, 
otherwise,  we  already  have  >  ^  nodes  in  levels  0  through  r,  implying  that  Ll{n)  nodes  lie  within 
distance  r  of  si. 

Now,  let  the  number  of  nodes  at  level  r  be  x.  All  the  nodes  in  level  r  +  1  in  the  residual  graph 
are  connected  to  nodes  in  level  r.  So,  their  number  is  at  most  the  size  of  the  neighbor  set  of  level  r. 
Now,  in  the  best  possible  case,  the  nodes  in  level  r  could  be  the  highest  degree  nodes  in  the  graph. 
In  this  case,  the  minimum  degree  of  any  node  in  level  r  is  given  by  y  with  =  x.  We  get 


y  = 


Now,  the  size  of  the  neighborhood  of  level  r  is  at  most  the  total  degree  of  nodes  in  the  level. 
This  is  given  by 


zanz  “  ^dz 


an 


a  —  1 


(y'““ 
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Figure  7.2:  The  model:  A  pictorial  view  of  the  graph  and  the  set  Vr 


ana 


a  — 


-{x^-a-1) 


This  quantity  is  at  least  ^  by  our  assumption  above.  Thus  we  get  that  x  —  ^n,  where  ^  — 
^^(1  —  j  .  This  is  a  eonstant  fraction  of  n. 

Now,  we  can  apply  the  same  technique  to  compute  the  number  of  nodes  at  level  r  —  1  and  then, 

a 

r  —  2.  We  get  that  the  number  of  nodes  at  level  r  —  2  is  at  least  j  with  ^  as 

given  above.  ■ 

Let  r  be  the  distance  defi  ned  by  the  above  lemma.  Let  %  denote  the  set  of  nodes  that  are 
within  distance  r  —  1  of  every  node  in  the  core  of  the  graph  (see  Figure  7.2).  By  lemma  1,  we  have 
\Vr\  =  r2(n).  Now,  the  proof  of  the  theorem  has  two  parts.  The  first  shows  that  many  trees  % 
corresponding  to  v  E  Vr  contain  the  edge  e*. 


Lemma  2  The  expected  number  of  shortest  path  trees  Ty,  corresponding  to  nodes  v  E  Vy,  that 
contain  the  edge  e*  is  Tl{n). 

Proof:  Consider  the  tree  Ty  for  some  node  v  E  Vy  This  is  essentially  a  breadth  first  tree  starting 
from  V.  If  Si  and  S2  are  at  the  same  level  in  the  tree,  then  the  edge  e*  is  not  contained  in  the  tree. 
On  the  other  hand,  if  the  nodes  are  at  different  depths  in  this  tree,  let  s  i  be  closer  to  v  without  loss 
of  generality.  In  this  case,  one  shortest  path  from  u  to  S2  is  via  si  and  since  we  break  ties  in  favor  of 
paths  with  high  degree  nodes,  Ty  will  contain  this  path  via  si.  This  implies  that  e*  is  contained  in 
the  tree.  Thus,  trees  containing  e*  correspond  to  those  v  that  are  not  equidistant  from  si  and  S2-  We 
now  show  that  there  are  f2(n)  nodes  v  E  Vy  that  are  not  equidistant  from  si  and  S2-  This  implies 
the  result. 
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First,  observe  that  if  we  pick  a  random  node  in  the  graph,  then  conditioned  on  the  fact  that  this 
node  lies  at  a  distance  d  —  1,  d  or  d  +  1  from  S2,  there  is  at  most  a  constant  probability  that  this 
node  lies  at  distance  d  from  S2-  This  is  because  using  an  argument  congruent  to  that  in  lemma  1,  we 
can  show  that  the  number  of  nodes  at  distance  d  —  1  from  S2  is  a  constant  fraction  of  the  number  of 
nodes  at  distance  d. 

Now,  consider  the  nodes  at  distance  r  —  2  from  si.  These  are  at  least  Q.{n)  in  number  (lemma  1) 
and  lie  in  Vr-  Given  that  a  node  v  is  picked  from  this  set,  n  is  at  a  distance  r  —  3,r  —  2orr  —  1  from 
S2-  By  the  above  argument,  the  probability  that  this  node  lies  at  distance  r  —  2  from  s 2  is  at  most  a 
constant.  Thus  f2(n)  nodes  in  this  set  are  not  at  distance  r  —  2  from  S2  and  we  are  done.  ■ 

Next  we  prove  that  in  any  tree  Ty  {v  €  Vr)  containing  e*,  e*  has  a  high  congestion. 

Lemma  3  Let  Ty  be  a  shortest  path  tree,  corresponding  to  v  E  Vr,  that  contains  the  edge  e*.  Then 
the  expected  congestion  on  edge  e*  in  this  tree  is 

Proof:  Without  loss  of  generality,  let  si  be  closer  to  v  than  S2-  We  show  that  the  degree  of  S2  in  Ty 
is  r2(n^/“).  This  implies  the  result.  Let  level  i  denote  the  set  of  nodes  at  distance  i  from  v  in  the 
tree. 

Let  d  be  the  distance  between  v  and  S2-  All  neighbors  of  S2  lie  in  levels  >  d  —  1  in  the  tree. 
Note  that  d  <  r  —  1.  Therefore  by  our  assumption,  the  number  of  nodes  lying  at  levels  >  d  +  1  in 
the  tree  is  at  least  the  number  of  nodes  at  distance  r  or  greater  from  si.  This  number  is  at  least 
by  the  defi  nition  of  r.  Let  W  denote  the  set  of  nodes  that  lie  at  levels  >  d  —  1  in  the  tree,  and  that 
arrived  in  the  graph  after  step  | .  Note  that  there  are  at  least  ^  nodes  at  level  d  +  1  or  higher  that  are 
in  set  W.  Therefore,  a  constant  fraction  of  the  nodes  in  W  lie  at  levels  >  d  +  1  in  the  tree. 

First  observe  that  the  probability  that  a  node  entering  the  graph  at  time  step  t  attaches  to  s  2  is 
——1 

roughly  fa  .  This  probability  increases  as  the  graph  becomes  larger  and  larger,  as  this  is  related. 
By  removing  the  fi  rst  quarter  of  the  nodes  entering  the  graph  from  consideration,  and  using  the  fact 
that  these  nodes  are  less  likely  to  attach  to  S2  than  nodes  arriving  later,  we  conclude  that  the  number 
of  neighbors  of  S2  that  arrive  after  step  n/4  is  at  least  3 /4th  of  the  total  degree  of  S2- 

Now  all  neighbors  of  S2  lie  at  levels  >  d  —  1  in  the  tree.  Then,  by  the  observation  in  the  previous 
paragraph,  we  have  that  at  least  3/4th  of  the  neighbors  of  S2  he  in  the  set  W.  Note  that  when  a 
node  in  W  entered  the  graph,  the  size  of  the  graph  varied  between  ^  and  n  nodes.  The  probability 
that  this  node  attached  to  S2  varied  between  na  and  (f  <  4na“^.  Thus  each  node  in  W  is 
roughly  equally  likely  to  attach  to  S2  (within  a  factor  of  4). 

Now  the  degree  of  S2  in  the  tree  is  at  least  the  number  of  its  neighbors  in  W  that  lie  at  levels 
>  d  +  1.  Using  the  fact  that  a  constant  fraction  of  the  nodes  in  W  lie  at  levels  >  d  +  1  in  the  tree, 
we  get  that  a  constant  fraction  of  the  neighbors  of  S2  lie  at  levels  >  d  +  1  in  the  tree,  in  expectation. 
The  result  follows  from  the  fact  that  the  degree  of  S2  is  )■  ■ 
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Figure  7.3:  (a)  Simulation  support  for  the  analytical  model:  Figure  (a)  shows  the  fraction  of  shortest  path  trees 
that  do  not  contain  the  edge  e*.  Figure  (b)  plots  the  ratio  of  degrees  of  si  and  S2  in  a  random  shortest  path  tree  to 
their  degrees  in  the  graph. 


Number  of  nodes 


Figure  7.4:  Maximum  edge  congestion:  Plotted  as  a  function  of  n,  in  Inet-3.0  generated  graphs,  with  a  =  1.23. 
The  fl  gure  also  plots  four  other  functions  to  aid  comparison  - 

7.2.1  Experimental  Support 

In  this  section,  we  report  experimental  results  to  show  that  the  theoretical  results  obtained  above 
hold  not  just  for  the  Preferential  Connectivity  Model,  but  also  for  Internet-like  AS  graphs  generated 
by  Inet-3.0.  Unfortunately,  the  graphs  generated  by  Inet-3.0,  have  different  values  of  a  for  different 
n.  This  is  consistent  with  the  observed  properties  of  the  Internet’s  AS  graph,  that  a  decreases  with 
time.  (We  discuss  this  in  further  detail  in  the  subsequent  section).  In  order  to  validate  our  theoretical 
results  and  observe  the  asymptotic  behavior  of  congestion  for  a  fi  xed  value  of  a,  we  modify  the  Inet- 
3.0  code  so  that  it  always  uses  a  fi  xed  value  of  a  =  1.23,  instead  of  recalculating  it  for  every  value  of 
n.  Each  reported  value  is  an  average  over  multiple  runs  of  the  simulation,  corresponding  to  different 
random  seeds  used  for  generating  the  graphs. 

Figure  7.3(a)  plots  the  fraction  of  nodes  that  are  equidistant  from  si  and  S2-  Note  that  this 
fraction  always  lies  below  0.4  and  is  consistent  with  our  result  in  Lemma  2  that  at  least  a  constant 
fraction  of  the  trees,  ^  in  this  case,  contain  the  edge  e*.  Figure  7.3(b)  compares  the  degrees  of 
the  two  highest  degree  nodes  in  the  graph  to  their  corresponding  degrees  in  the  shortest  path  tree 
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(a)  Maximum  edge  eongestion  (b)  Distribution  of  edge  eongestion 


Figure  7.5:  Edge  congestion  with  shortest  path  routing  and  any-2-any  communication:  The  fi  gure  on  the  left  shows 
the  maximnm  edge  congestion.  The  fi  gure  on  the  right  shows  the  distribution  of  congestion  over  all  links,  with  the 
number  of  links  normalized  to  1  in  each  case.  The  fi  gure  on  the  left  also  plots  the  worst  congestion  for  exponential 
graphs  and  preferential  connectivity  trees. 

corresponding  to  some  random  node  v.  We  fi  nd  that  the  ratio  of  the  two  degrees  for  si  is  consistently 
above  0.9.  Similarly,  the  ratio  of  the  two  degrees  for  S2  is  always  above  0.8  and  increasing.  This  is 
consistent  with  the  ti  ndings  of  Lemma  3. 

Finally,  we  plot  the  maximum  congestion  in  graphs  generated  by  Inet-3.0,  as  a  function  of  the 
number  of  nodes  in  the  graph,  in  Figure  7.4.  Note  that  the  maximum  congestion  scales  roughly  as 
which  is  exactly  for  a  =  1.23.  This  corroborates  our  fi  nding  in  Theorem  1. 

7.3  Simulation  Results 

In  this  section,  we  present  the  results  from  our  simulation  study  over  Inet-generated  graphs.  Hence¬ 
forth,  we  shall  use  the  graphs  generated  by  Inet  3.0  as  is,  that  is,  we  do  not  alter  the  way  Inet 
chooses  a  to  depend  on  n.  In  what  follows,  we  fi  rst  show  results  for  shortest-path  routing,  followed 
by  policy-based  routing.  In  both  cases,  we  fi  rst  present  results  for  the  any-2-any  communication 
model,  then  for  the  leaf-2-leaf  model  and  fi  nally  for  the  clout  model. 

7.3.1  Shortest-Path  Routing 

Figure  7.5(a)  shows  the  maximum  congestion  in  power  law  graphs  generated  by  Inet-3.0  as  a  func¬ 
tion  of  the  number  of  nodes.  We  use  the  any-2-any  model  of  communication  here.  From  the  trend 
in  the  graph,  it  is  clear  that  the  maximum  congestion  in  Internet-like  power  law  graphs  scales  as 
^i-l-n(i)  Qj.  ^oj-gg  Notice  also  that  the  slope  of  the  maximum  congestion  curve  is  slightly  increas¬ 
ing.  This  can  be  explained  as  follows.  As  mentioned  earlier,  Inet-3.0  chooses  the  exponent  of 
the  power  law  degree  distribution  as  a  function  of  the  number  of  nodes  n:  a  —  at  +  b,  where 
t  =  a  =  —0.00324,  h  =  1.223,  s  =  0.0281  and  no  =  3037^.  Notice  that  the  absolute  value 

®a,  b  and  a  are  empirically  determined  constants,  no  is  the  number  of  ASes  in  the  Internet  in  November  1997. 
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(a)  Maximum  edge  eongestion  (b)  Distribution  of  edge  eongestion 

Figure  7.6:  Edge  congestion  with  shortest  path  routing  and  leaf-2-leaf  commnnication:  The  fi  gure  on  the  left  shows 
the  maximum  edge  congestion.  The  fi  gure  on  the  right  shows  the  dlstrihntlon  of  congestion  over  all  links  (again 
normalized). 

of  a.  deereases  as  n  inereases,  and  so,  as  our  lower  bound  of  suggests,  the  slope  of  the 

funetion  on  a  log-log  plot  should  steadily  inerease.  In  faet  around  n  =  28000,  a  beeomes  less  than 
1  and  at  this  point  we  expeet  the  eurve  to  seale  roughly  as  0{n^),  whieh  is  the  worst  possible  rate 
of  growth  of  eongestion. 

The  fi  gure  also  shows  the  maximum  eongestion  in  power  law  trees  and  exponential  graphs.  The 
power  law  trees  we  generate,  have  the  exponent  a  between  1.66  and  1.8,  the  value  inereasing  with 
the  number  of  nodes  in  the  tree.  These  exponents  are  signifi  eantly  higher  than  those  of  the  eorre- 
sponding  power  law  graphs.  Notiee  that  the  edge  eongestion  on  power  law  trees  grows  mueh  faster 
eompared  to  graphs  whieh  is  expeeted  sinee  trees  have  mueh  fewer  edges.  Our  lower  bound  on  the 
maximum  eongestion,  whieh  holds  equally  well  for  trees  satisfying  power  law  degree  distributions, 
prediets  the  slope  of  the  eurve  for  trees  to  be  at  least  1.5,  whieh  is  eonsistent  with  the  above  graph. 

On  the  other  hand,  we  notiee  that  edge  eongestion  in  exponential  graphs  is  mueh  smaller  eom¬ 
pared  to  power  law  graphs.  In  faet,  edge  eongestion  in  exponential  graphs  has  an  approximately 
linear  growth.  This  eould  be  explained  intuitively  as  follows:  Reeall  that  for  eaeh  n,  we  ehoose  the 
exponent  P  of  the  exponential  distribution  so  as  to  mateh  the  total  number  of  edges  of  the  eorre- 
sponding  n-node  power  law  graph.  Beeause  the  power  law  distribution  has  a  heavier  tail  eompared 
to  the  exponential  distribution,  the  latter  has  more  edges  ineident  on  low  degree  nodes.  Conse¬ 
quently,  low  degree  vertiees  in  an  exponential  graph  are  better  eonneeted  to  other  low  degree  ver- 
tiees.  Edges  ineident  on  low  degree  nodes  “absorb”  a  large  amount  of  eongestion  leading  to  lower 
eongestion  on  edges  ineident  on  high  degree  nodes.  As  n  inereases  the  degree  distribution  beeomes 
more  and  more  even,  resulting  in  a  very  slow  inerease  in  eongestion. 

In  Figure  7.5(b),  we  show  the  eongestion  aeross  all  links  in  a  power  law  graph.  Notiee  that  at 
higher  numbers  of  nodes,  the  distribution  of  eongestion  beeomes  more  and  more  uneven.  The  eor- 
responding  set  of  graphs  for  the  leaf-2-leaf  eommunieation  model  is  shown  in  Figure  7.6.  The  worst 
eongestion  is  eonsistently  about  0.8  times  the  worst  eongestion  for  the  any-2-any  model  (not  explie- 
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(b)  Distribution  of  edge  eongestion 


Figure  7.7:  Edge  congestion  with  shortest  path  routing  and  clout  model  of  communication:  The  figure  on  the  left 
shows  the  maximnm  edge  congestion.  The  fi  gure  on  the  right  shows  the  distribution  of  congestion  over  all  links 
(again  normalized). 


itly  shown  in  the  graph).  The  eongestion  aeross  all  edges,  plotted  in  Figure  7.6(b),  also  displays  a 
similar  trend  as  for  the  any-2-any  model.  The  distribution  beeomes  more  uneven  as  the  number  of 
nodes  inereases. 

The  results  for  the  elout  model  are  more  interesting  with  the  resulting  maximum  eongestion  in 
the  graph  sealing  mueh  worse  than  before.  Indeed,  as  Figure  7.7(a)  shows,  the  maximum  eongestion 
seales  worse  than  n^.  This  is  beeause  the  total  traffie  in  the  graph  also  grows  roughly  as  O(n^). 
Again,  as  with  the  any-2-any  model,  the  smaller  absolute  values  of  a  in  the  graphs  generated  by 
Inet-3.0  for  larger  values  of  n  is  the  reason  for  the  inereasing  slope  of  the  eurve.  The  graph  of  the 
eongestion  aeross  all  edges  in  this  model,  shown  in  Figure  7.7(b),  is  equally  interesting.  Compared 
to  Figure  7.6(b)  of  the  leaf-2-leaf  model.  Figure  7.7(b)  looks  very  different:  the  unevenness  in 
eongestion  is  mueh  more  pronouneed  in  the  elout  model  of  eommunieation.  In  other  words,  the  non- 
uniform  traffi  e  demand  distribution  only  seems  to  exaeerbate  the  already  poor  eongestion  sealing  of 
the  Internet-like  graphs. 

7.3.2  Policy-Based  Routing 

Figure  7.8  shows  the  maximum  edge  eongestion  for  the  three  eommunieation  models,  when  pol- 
iey  based  routing  is  used.  For  the  any-2-any  and  leaf-2-leaf  models,  shown  in  Figure  7.8(a),  the 
maximum  edge  eongestion  seales  almost  identieally  to  that  for  shortest  path  routing  (eompared  with 
Figure  7.5(a)  and  7.6(a)).  However,  somewhat  surprisingly,  for  the  elout  model  (Figure  7.8(b)),  eon¬ 
gestion  under  poliey  based  routing  seales  only  as  eompared  to  over  for  shortest-path  routing. 

Figure  7.9(a)  eompares  maximum  eongestion  obtained  for  poliey  routing  to  that  for  shortest 
path  routing.  Notiee  that  the  two  eurves  are  almost  overlapping,  although  poliey  routing  seems  to 
be  slightly  worse  when  the  graph  is  small  and  gets  better  as  the  graph  grows  larger.  This  observation 
ean  be  explained  as  follows:  poliey  routing  disallows  eertain  paths  from  being  used  and  eould,  in 
general,  foree  eonneetions  to  be  routed  over  longer  paths.  This  would  inerease  the  overall  traffi  e  in 
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Figure  7.8:  Policy-based  routing:  Maximum  Edge  congestion  with  policy-based  routing  in  HLSs. 
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Figure  7.9:  Policy-based  vs  shortest  path  routing:  Comparison  of  edge  congestion  for  shortest  path  and  policy 
based  routing  in  the  any-2-any  model 


the  network  leading  to  higher  congestion,  especially  for  a  smaller  graph  size.  However,  as  the  size  of 
the  graph  grows,  there  are  more  and  more  shortest  paths  available.  As  a  result,  the  constraints  placed 
by  policy-based  routing  might  not  have  any  signiti  cant  impact  on  the  path  lengths  in  the  graph.  In 
fact,  at  higher  numbers  of  nodes,  policy  routing  could  provide  better  congestion  properties,  albeit 
only  marginally  different,  than  shortest  path  routing.  This  is  because  while  shortest  path  routing 
always  picks  paths  that  go  over  high  degree  nodes,  a  fraction  of  these  paths  might  not  be  allowed 
by  policy  routing  as  they  could  involve  more  than  one  peering  edge.  In  this  case,  policy  routing 
moves  traffi  c  away  from  the  hot-spots,  thereby,  partially  alleviating  the  problem.  We  believe  that 
this  is  also  the  reason  for  the  congestion  scaling  in  the  clout  model  to  be  better  when  considering 
policy-routing,  as  opposed  to  shortest-path  routing. 

In  order  to  verify  that  the  above  observation  is  not  just  an  artifact  of  our  machine  learning-based 
labeling  algorithms,  we  plot  the  same  curves  for  AERs  in  Eigure  7.9(b).  These  display  exactly  the 
same  trend — policy  routing  starts  out  being  worse  than  shortest  path,  but  gets  marginally  better  as 
n  increases.  To  summarize,  policy  routing  does  not  worsen  the  congestion  in  Internet-like  graphs, 
contrary  to  what  common  intuition  might  suggest.  In  fact,  policy  routing  might  perform  marginally 
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Figure  7.10:  Tie-breaking  rules  in  shortest-path  routing:  a  =  1.23.  The  fl  gnre  plots  the  three  different  variations 
of  breaking  ties  in  shortest  path  routing. 

better  than  shortest  path  routing. 

7.3.3  Shortest  Path  Routing  Variations 

As  mentioned  in  Section  7.1.1,  in  the  shortest  path  routing  scheme,  whenever  there  are  multiple 
shortest  paths  between  two  nodes,  we  pick  the  path  that  contains  higher  degree  nodes  to  route  the 
flow  between  them.  It  may  appear  that  the  poor  congestion  properties  of  power  law  graphs  are 
a  result  of  this  tie  breaking  rule,  and  an  alternate  rule  that  favors  low  degree  nodes  may  perform 
better  by  alleviating  the  congestion  on  high  degree  nodes.  In  order  to  confi  rm  that  our  results  are 
robust  across  various  tie-breaking  rules,  we  performed  the  experiments  with  two  variants  of  the  tie¬ 
breaking  rule:  favoring  paths  that  contain  lower  degree  nodes,  and  choosing  a  random  shortest  path 
when  there  is  a  choice  of  more  than  one. 

For  these  experiments,  we  set  a  to  be  a  constant  value  of  1.23  in  Inet  3.0  and  compare  the 
resulting  relations  between  maximum  edge  congestion  and  the  number  of  nodes.  As  Figure  7.10 
depicts,  there  is  no  noticeable  difference  between  the  three  types  of  tie-breaking  methods.  The  same 
holds  true  for  leaf-2-leaf  and  clout  models  of  traffi  c  (results  are  omitted  for  brevity).  This  is  because 
very  few  vertex  pairs  have  multiple  shortest  paths  between  them.  We  thus  conclude  that  our  scheme 
of  breaking  ties  by  favoring  paths  containing  higher  degree  nodes  does  not  skew  our  results. 

7.4  Improving  the  Congestion  Scaling  Properties 

Our  analytical  and  simulation  results  have  shown  that  the  maximum  congestion  in  Internet-like 
power  law  graphs  scales  rather  poorly  in  the  graph  size — Our  results  show  that  edges 
between  high  degree  nodes,  which  are  typically  peering  edges  between  backbone  carriers  in  the  In¬ 
ternet  core,  are  likely  to  get  congested  more  quickly  over  time  than  other  edges.  In  such  a  situation, 
to  enhance  the  scaling  properties  of  the  network,  it  might  become  necessary  to  either  change  the 
routing  algorithm  employed  by  ASes  in  the  Internet  (i.e.,  BGP-style  routing)  or  alter  the  intercon- 
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Figure  7.11:  Degree  vs  congestion:  Edge  Congestion  versus  the  average  degree  of  the  nodes  incident  on  the  edge 
(any-2-any  model  with  shortest  path  routing).  The  congestion  is  higher  on  edges  with  a  high  average  degree. 

nection.  Next,  we  address  the  latter  issue  of  altering  the  strueture  of  the  Internet  graph.  Speeifi  eally, 
we  foeus  on  meehanisms  for  inereasing  the  parallelism  in  edges  between  neighboring  nodes  in  the 
Internet  AS -graph. 

7.4.1  Adding  Parallel  Network  Links 

We  examine  ways  in  whieh  additional  links  ean  be  plaeed  in  the  network,  so  as  to  eontain  the  effeet 
of  bad  sealing  of  the  maximum  eongestion.  Speeifi  eally,  we  eonsider  the  model  in  whieh  eaeh  link 
ean  be  replaeed  by  multiple  links  (between  the  same  pair  of  nodes)  that  ean  share  the  traffi  e  load^. 
Ideally,  we  would  like  to  provide  suffi  eient  parallel  links  between  a  pair  of  nodes,  so  that  the  total 
eongestion  on  the  eorresponding  edge  divided  equally  among  these  parallel  links,  even  in  the  worst 
ease,  grows  at  about  the  same  rate  as  the  size  of  the  network.  The  number  of  parallel  links  between  a 
pair  of  nodes  may  need  to  ehange  as  the  network  grows  to  aehieve  this  goal.  Notiee  that  this  ehange 
does  alter  the  degree-strueture  of  the  graph,  but  the  alteration  is  only  due  to  inereased  eonneetivity 
between  already  adjaeent  nodes*.  In  other  words,  this  does  not  require  new  edges  between  nodes 
that  were  not  adjaeent  before. 

In  faet,  the  network,  at  an  AS  level,  already  ineorporates  this  eoneept  of  parallel  links.  For 
example,  the  power  law  strueture  of  the  AS  graph  only  eonsiders  the  adjaeeney  of  ASes:  the  link 
between  Sprint  and  AT&T,  for  instanee,  is  modeled  by  a  single  edge.  However,  in  the  real  world  the 
Sprint  and  AT&T  ASes  are  eonneeted  to  eaeh  other  in  a  large  number  of  plaees  around  the  world. 
However,  not  mueh  is  known  about  the  degree  of  sueh  eonneetivity  in  the  Internet  today. 

In  order  to  guide  the  addition  of  parallel  edges  between  adjaeent  nodes,  we  first  observe  that 
there  is  elear  eorrelation  between  the  average  degree  and  edge  eongestion.  Figure  7.11  plots  the 
eongestion  of  eaeh  edge  against  the  average  degree  of  the  nodes  on  whieh  it  is  ineident.  We  show 
the  results  for  shortest  path  routing  on  an  Inet  generated  graph  of  30000  nodes  where  any-2-any 

^For  results  on  alternate  methods  of  alleviating  congestion,  please  refer  to  [6]. 

*Note  that  the  routing  is  still  done  based  on  the  original  degrees  of  nodes. 
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Figure  7.12:  Alleviating  congestion:  Maximnm  relative  congestion  for  shortest  path  ronting,  any-2-any  model, 
when  parallel  links  are  added  to  the  graph  nsing  the  snm,  product  and  max  functions. 

communication  is  used.  The  fi  gure  shows  that  edges  ineident  on  high  degree  nodes  have  much 
higher  eongestion  than  those  incident  on  lower  degree  nodes.  This  suggests  that  a  good  ehoiee  for 
the  number  of  parallel  links  substituting  any  edge  in  the  graph,  eould  depend  on  the  degrees  of  nodes 
which  an  edge  eonneets. 

We  examine  several  ways  of  adding  parallel  links  based  on  the  above  observation.  In  partieular, 
we  let  the  number  of  links  between  two  nodes  be  some  function  of  the  degrees  of  the  two  nodes 
and  we  eonsider  the  following  funetions:  (1)  sum  of  degrees  of  the  two  nodes,  (2)  produet  of  the 
degrees  of  the  two  nodes,  (3)  maximum  of  the  two  degrees  and,  (4)  minimum  of  the  two  degrees. 
We  then  eompute  the  maximum  relative  eongestion  for  these  funetions,  that  is,  the  maximum  over 
all  edges,  of  the  eongestion  on  the  edge  divided  by  the  number  of  parallel  links  eorresponding  to 
eaeh  edge.  The  maximum  eongestion  on  Internet  graphs  for  these  models  of  adding  new  edges  is 
shown  in  Figure  7.12.  Notiee  that,  surprisingly,  when  parallel  links  are  added  aeeording  to  any  of 
the  above  four  funetions  the  maximum  relative  eongestion  in  the  graph  seales  linearly.  This  implies 
that  adding  parallelism  in  the  edges  of  Internet-like  graphs  aeeording  to  the  above  simple  funetions 
is  enough  to  ensure  that  uniform  sealing  of  link  eapaeities  (for  example,  based  on  Moore’s  Law-like 
teehnology  trends)  ean  maintain  uniform  levels  of  eongestion  in  the  network  and  avoid  persistent 
hot-spots. 

7.5  On  Networking  Modeling  and  Congestion  Scaling 

There  have  been  several  theoretieal  studies  aimed  at  studying  the  properties  of  large-scale,  Internet¬ 
like  graphs,  as  well  as  those  analyzing  eongestion  scaling  properties  of  general  graphs.  In  this 
seetion,  we  present  a  brief  overview  of  some  of  these  studies. 

Of  studies  aiming  to  eharaeterize  properties  of  Internet-like  graphs,  one  elass  has  proposed 
various  models  of  graph  evolution  that  result  in  a  power  law  degree  distribution.  Notable  examples 
inelude  the  power  law  random  graph  model  of  Aiello  et.  al.  [1],  the  bi-eriteria  optimization  model 
of  Fabrikant  et.  al.  [36]  and  the  Preferential  Conneetivity  model  of  Barabasi  and  Albert  [21,  12]. 
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Another  class  of  studies  in  this  category  [37,  86,  109]  is  aimed  at  analyzing  the  properties  of  power 
law  graphs.  However,  most  of  these  are  based  on  inferences  drawn  from  measurements  of  real  data. 
The  primary  application  of  this  latter  class  of  studies  is  to  construct  realistic  generators  [73,  115, 
109]  for  Intemet-like  graphs. 

The  problem  of  characterizing  congestion  in  graphs,  and  specifi  cally  designing  routing  schemes 
that  minimize  congestion,  has  been  studied  widely  in  approximation  and  online  algorithms.  The 
worst  congestion  in  a  graph  is  inversely  related  to  the  maximum  concurrent  flow  that  can  be  achieved 
in  the  graph  while  obeying  unit  edge  capacities.  The  latter  is,  in  turn,  related  to  a  quantity  called 
the  cut  ratio  of  the  graph.  Aumann  et.  al.  [17]  characterize  the  relationship  between  maximum 
concurrent  flow  and  cut  ratio:  The  maximum  concurrent  flow  that  can  be  achieved  in  a  graph  is 
always  within  a  factor  of  O(logn)  of  the  cut  ratio,  where  n  is  the  number  of  nodes.  Okamura  et. 
al.  [82]  give  bounds  on  the  cut  ratio  for  special  graphs.  Algorithmic  approaches  to  the  problem  (see 
[64,  65]  for  a  survey)  use  a  multi-commodity  flow  relaxation  of  the  problem  to  fi  nd  a  fractional 
routing  with  good  congestion  properties  (wherein  demand  between  pairs  of  nodes  is  split  across 
multiple  paths).  Although  fairly  good  approximation  factors  have  been  achieved  for  the  problem, 
most  of  the  proposed  routing  schemes  are  not  distributed,  involve  a  lot  of  book-keeping,  or  require 
solving  large  linear  programs,  which  makes  them  impractical  for  wide-area  Internet  routing. 

Perhaps  the  work  that  shares  similar  goals  as  ours  is  that  of  Gksanditis  et  al.  [44].  Using  ar¬ 
guments  from  max-flow  min-cut  theory,  their  paper  shows  that  graphs  obeying  power  law  degree 
distribution  have  good  expansion  properties  in  that,  they  allow  routing  with  0{n  log^  n)  congestion, 
which  is  close  to  the  optimal  value  of  0{n  log  n)  achieved  by  regular  expanders.  In  a  follow-up  pa¬ 
per,  Mihail  et  al.  [74]  prove  similar  results  on  the  expansion  properties  of  graphs  generated  using 
the  Preferential  Connectivity  model.  The  results  presented  in  this  chapter,  in  contrast  with  these 
two  contemporary  papers,  focus  specifi  cally  on  commonly-used  routing  algorithms  such  as  policy 
routing  and  shortest  path  routing.  We  show  that  power  law  graphs  exhibit  poor  scaling  properties 
with  respect  to  these  routing  algorithms. 

7.6  Analysis  Caveats,  Summary  of  Observation  and  their  Implications 

In  this  chapter,  we  addressed  the  question  of  how  the  worst  congestion  in  Intemet-like  graphs 
(specifi  cally  at  the  AS-level)  scales  with  the  graph  size.  The  key  observations  are  shown  in  Table  7.1. 
Using  a  combination  of  analytical  arguments  and  simulation  experiments,  we  showed  that  maximum 
congestion  scales  poorly  in  Internet-like  power  law  graphs.  Our  simulation  results  showed  that  the 
non-uniform  demand  distribution  between  nodes  only  exacerbates  the  congestion  scaling.  However, 
we  found,  surprisingly,  that  policy  routing  between  adjacent  ASes  may  not  worsen  the  congestion 
scaling  on  power  law  graphs  and  might,  in  fact,  be  marginally  better  when  compared  to  shortest-path 
routing. 

We  note  that  with  the  current  trend  of  the  growth  of  the  Internet  it  is  possible  that  some  locations 
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The  expected  maximum  congestion  in  Internet-like  Preferential  Attachment  power  law 
graphs  with  unit  traffi  c  demands  and  shortest  path  routing  scales  as  where  n 

is  the  number  of  nodes  in  the  graph. 


When  non-uniform  traffi  c  matrices  are  considered,  the  congestion  scaling  properties 
worsen  signifi  cantly. 


Policy  routing  results  in  similar,  if  not  marginally  better,  congestion  than  shortest  path- 
based  routing. 


The  poor  congestion  scaling  properties  of  the  Internet  graph  can  be  fi  xed  using  very 
simple  heuristics  to  alter  the  topology  of  the  network.  For  example,  adding  parallel 
edges  between  adjacent  nodes  in  proportion  to  the  minimum  of  their  degrees  can  result 
in  a  linear  scaling  of  congestion. 


Table  7.1:  Congestion  Scaling  in  the  Internet:  Summary  of  key  observations  regarding  the  scaling  properties  of 
the  Internet. 


in  the  network  might  eventually  become  perpetual  hot-spots.  Fortunately,  however,  there  is  an 
intuitively  simple  fi  x  to  this  problem.  Adding  parallel  links  between  adjacent  nodes  (ASes)  in  the 
graph  according  to  simple  functions  of  their  degrees  will  help  the  maximum  congestion  in  the  graph 
scale  linearly.  In  this  case,  it  might  not  be  necessary  for  the  capacity  of  some  links  in  the  graph 
to  grow  at  a  faster  rate  than  the  others.  In  fact,  a  natural  evolution  of  link  capacities  according  to 
Moore’s  Law  may  be  suffi  cient  to  accommodate  the  growing  traffi  c  demand  in  fhe  nefwork. 

Nexf,  we  discuss  imporfanf  caveafs  in  our  analysis  and  simulations. 

Analysis  Caveats.  We  would  like  to  mention  that  the  results  presented  in  this  chapter  may  not  hold 
in  general  for  all  power  law  graphs.  Our  results  (both  simulation-based  and  analytical)  are  meant 
for  graphs  representing  Internet  connectivity  at  the  AS  level.  These  results,  may  not  apply  to  power 
law  random  graphs  [1].  Note  also  that  while  the  preferential  connectivity  model  is  known  to  yield 
graphs  with  a  similar  degree  distribution  as  the  AS -level  graph,  it  is  not  clear  whether  the  model 
accurately  captures  the  AS-level  connectivity  dynamics  (e.g.,  economic  considerations  for  peering). 
That  said,  our  simulations  on  measured  AS-level  graphs  show  that  our  key  observations  hold  for  the 
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existing  AS  graph.  Therefore,  if  the  eurrent  dynamies  of  eonneetivity  between  ASes  eontinues  to 
hold  in  the  future,  we  ean  expeet  our  results  to  hold  for  future  AS -level  graphs  too. 

Our  analysis  does  not  extend  to  router-level  graphs.  Analyzing  the  router-level  topology  is 
mueh  harder  eompared  to  the  AS-level  graph  due  to  three  reasons:  (1)  Not  mueh  is  known  about 
the  topology  of  Internet’s  router-level  graph.  Most  existing  maps  of  the  Internet’s  router-level  topol¬ 
ogy  (sueh  as  Roeketfuel  maps  [105,  28])  are  eonsidered  ineomplete;  (2)  IP-level  routing  eannot 
be  modeled  easily  using  shortest  path  routing  or  simple  inter-domain  poliey-based  routing,  sinee 
this  would  require  knowledge  of  traffi  e  engineering  employed  by  ASes  in  the  Internet;  (3)  Finally, 
some  researehers  have  used  power  law  graphs  resulting  from  probabilistie  models  (sueh  as  power 
law  random  graph  models  [1])  to  approximate  the  router-level  eonneetivity  (see  for  example  [109]). 
However,  reeent  work  has  shown  that  sueh  models  are  error-prone  sinee  they  do  not  explieitly  eon- 
sider  the  teehnologieal  and  eeonomie  eonstraints  or  trade-offs  behind  router  intereonneetions  [66]. 
Graphs  arising  from  sueh  trade-offs  are  referred  to  as  Heuristically  Optimal  Topologies.  However, 
there  are  no  analytieally-traetable  models  for  generating  sueh  topologies.  A  thorough  analysis  of 
the  router-level  intereonneetion  is  a  ehallenging  open  problem. 

The  key  results  from  our  study  of  eongestion  sealing  in  the  Internet  graph  may  be  simply  sum¬ 
marized  as  follows:  The  eongestion  along  edges  in  the  Internet  graph  is  likely  to  seale  poorly  with 
the  growing  size  of  the  network.  As  a  result,  end-networks  eannot  employ  routing-based  meeha- 
nisms  sueh  as  route  eontrol  to  extraet  good  performanee  from  the  future  network.  However,  simple 
heuristies  for  adding  parallel  edges  between  adjaeent  vertiees  in  the  graph  ean  help  signifi  eantly 
improve  the  eongestion  sealing  properties. 
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Chapter  8 


Conclusions  and  Open  Problems 


In  this  chapter,  we  outline  the  contributions  of  this  thesis  and  present  questions  for  future  consider¬ 
ation. 


8.1  Thesis  Summary 

This  dissertation  sought  to  answer  the  following  central  question:  What  routing-based  mechanisms 
can  well-connected  end-networks  employ  to  improve  their  Internet  access  experience?  Specifi  cally, 
we  were  interested  in  understanding  if  end-networks  today  required  special  support  from  the  In¬ 
ternet  routing  protocol  suite  or  the  routing  infrastructure  to  obtain  better  performance.  Our  answer 
to  this  question,  in  short,  is  No.  It  is  suffi  cient  for  end-networks  to  simply  make  clever  use  of  a 
small  number  of  routes  per  destination,  as  determined  by  the  Internet’s  routing  protocol.  To  achieve 
better  performance  in  this  manner,  end-networks  do  not  require  changes  to  the  current  routing  proto¬ 
col,  or  support  from  special-purpose  infrastructures,  but  can  instead  rely  on  purely  end  point-based 
mechanisms. 


8.2  Contributions 

This  dissertation  makes  several  important  contributions.  The  foremost  contributions  are  the  analysis 
of  the  benefi  ts  of  multihoming  route  control  and  the  implementation  and  evaluation  of  a  simple, 
practical  route  control  system.  In  addition,  we  present  the  fi  rst-ever  characterization  of  performance 
bottlenecks  inside  ISP  networks.  We  also  study  the  impact  of  the  growth  of  congestion  at  these 
bottlenecks  on  future  end-to-end  performance.  We  present  an  overview  of  these  contributions  next. 

8.2.1  Properties  of  wide-area  bottlenecks 

A  key  challenge  in  optimizing  the  Internet  performance  of  well-connected  end-networks  is  to  iden¬ 
tify  the  heavily-loaded  network  links  that  constrain  performance.  For  years,  common  knowledge  of 
wide-area  bottleneck  links  was  limited  to  folklore — these  bottlenecks  were  widely  believed  to  be 
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confi  ned  to  the  edges  of  Internet  domains  (e.g.,  peering  links),  where  link  utilization  is  supposedly 
high. 

In  this  dissertation,  we  eondueted  the  fi  rst  quantitative  study  of  the  eharaeteristies  of  typieal 
wide-area  bottleneek  links.  We  probed  a  large  number  of  paths  between  well-eonneeted  maehines 
loeated  at  universities  and  routers  inside  various  earrier  ISPs.  We  developed  a  suite  of  tools  to 
automatieally  identify  and  eharaeterize  bottleneek  links  along  these  paths.  Using  these  tools,  we 
found  that  bottleneek  links  with  very  low  available  bandwidth  are  prevalent  in  the  wide-area  Internet, 
espeeially  inside  or  between  small  regional  providers.  We  also  dedueed  signifi  eant  eorrelations 
between  the  likelihood  of  wide-area  links  appearing  as  bottleneeks  and  the  lateneies  of  the  links. 
Finally,  we  observed  that,  eontrary  to  popular  pereeption,  wide-area  bottleneeks  are  almost  evenly 
split  between  peering  and  intra-domain  links. 

8.2.2  Benefi  ts  of  multihoming  route  control  and  comparison  with  overlay  routing 

Although  constrained  bottlenecks  exist  in  the  wide-area  Internet,  the  Internet’s  rich  topology  makes 
it  possible  to  “route  around”  them.  Past  studies  for  circumventing  wide-area  performance  bottle¬ 
necks  advocated  using  overlay  routing.  In  this  approach,  end-points  can  route  their  traffi  c  via  in¬ 
termediate  “overlay”  nodes  deployed  around  the  Internet.  This  helps  end-points  bypass  the  default 
routes  determined  by  Internet’s  routing  protocol  (BGP),  and  avoid  bottlenecks  along  these  routes. 

In  contrast,  we  believed  that  a  much  simpler  approach  called  Multihoming  Route  Control  can 
offer  similar  performance  improvements  as  overlay  routing.  In  this  strategy,  an  end-network  buys 
connectivity  from  a  few  different  ISPs  and  intelligently  schedules  its  traffi  c  across  the  ISPs.  The 
idea  of  multihoming  route  control  was  introduced  a  few  years  ago  by  commercial  products  such  as 
RouteScience  and  SockEye.  However,  little  was  known  about  the  tme  extent  of  the  benefi  fs  of  fhese 
producfs. 

In  fhis  disserfafion,  we  quanfifi  ed  fhe  pofenfial  benefi  fs  of  mulfihoming  roufe  confrol  in  im¬ 
proving  fhe  Web  download  performance  of  Infernef  end-poinfs,  as  well  as  fheir  resilience  fo  service 
inferrupfions.  Using  Infernef-scale  experimenfs  conducfed  in  collaborafion  wifh  Akamai  Technolo¬ 
gies,  we  showed  fhaf  by  mulfihoming  fo  fhree  ISPs,  and  infelligenfly  scheduling  fransfers  across 
fhe  ISPs,  an  end-nefwork  could  pofenfially  improve  ifs  Infernef  response  fimes,  fransfer  speeds  and 
availabilify  by  up  fo  30%,  relative  fo  using  a  single  ISP  connection.  We  also  showed  fhaf  employ¬ 
ing  more  fhan  3  ISP  connections  offers  little  addifional  benefi  f,  buf  fhe  ISPs  fhemselves  musf  be 
carefully  selected  in  order  fo  realize  fhe  maximum  possible  improvemenfs. 

Since  mulfihoming  roufe  confrol  is  BGP  roufing-complianf,  if  cannof  provide  nearly  fhe  same 
roufe  selection  flexibilify  as  overlay  routing.  Using  Infernef-scale  measuremenf,  we  showed  fhaf 
despife  fhis  seemingly  limited  flexibilify,  mulfihoming  roufe  confrol  offers  only  marginally  inferior 
Interne!  performance  fhan  overlay  routing.  For  example,  fhe  fransfer  speeds  from  mulfihoming 
fo  fhree  ISPs  are  af  mosf  10%  inferior  relative  fo  overlay  roufing.  This  observation  suggesfs  fhaf 
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there  is  defi  nite  hope  of  extraeting  good  performanee  from  the  BGP  protocol,  and  therefore,  it  is 
unnecessary  to  replace  BGP  by  a  different  protocol  altogether. 

8.2.3  Route  control  in  practice 

We  extended  the  above  work  on  the  potential  benefi  ts  of  route  control  by  implementing  and  eval¬ 
uating  a  multihomed  route  control  system.  We  showed  that,  in  practice,  route  control  products 
could  employ  very  simple  design  and  operational  principles  to  extract  nearly-optimal  Internet  per¬ 
formance  from  multihoming.  Using  trace-driven  emulations  of  an  enterprise  network  with  3  ISP 
connections,  we  showed  that  our  proposed  schemes  could  improve  the  Web  performance  of  mul¬ 
tihomed  end-networks  by  about  25%  when  compared  to  using  a  single  ISP.  Furthermore,  the  Web 
performance  from  our  proposals  is  at  most  10%  away  from  the  best  possible  performance.  We  also 
exposed  important  flaws  in  the  design  of  current  route  control  appliances.  For  example,  we  showed 
that  the  conventional  practice  of  employing  historical  measurement  samples  to  monitor  and  predict 
ISP  performance  could,  in  fact,  result  in  sub-optimal  performance. 

8.2.4  Congestion  scaling  at  bottlenecks 

The  observations  regarding  multihoming  showed  that  today’s  routing  protocols  and  topology  can 
support  good  Internet  performance.  Over  time,  the  Internet  will  grow  in  size  and  traffi  c  volumes  will 
increase.  At  the  same  time,  ISPs  will  upgrade  network  link  capacities  to  accommodate  the  growing 
traffi  c  load.  In  this  thesis,  we  observed  that  despite  the  improvement  of  link  speeds  in  the  future,  the 
Internet’s  topology  and  routing  may,  in  fact,  cause  the  load  on  certain  links  in  the  network  to  increase 
at  a  much  faster  rate  than  on  others.  Such  links  could  soon  evolve  into  persistent  bottlenecks,  and, 
in  turn,  signifi  cantly  limit  future  Internet  performance.  Specifi  cally,  we  analyzed  a  simple  model  of 
the  routing  and  topology  of  the  Internet  at  an  Autonomous  System  level  (AS-level),  and  showed  that 
the  congestion  at  key  links  in  the  network  may  grow  as  poorly  as  where  n  is  the  number  of 

ASes.  This  result  implies  that  we  may  have  to  carefully  alter  the  Internet’s  topology  and/or  routing 
to  guarantee  good  end-to-end  performance  in  the  future  network.  To  this  end,  using  large-scale 
simulations,  we  showed  that  small  fi  xes  to  the  Internet’s  AS-level  interconnections  could  drastically 
reduce  the  congestion  in  the  network.  For  example,  we  observed  that  adding  parallel  edges  between 
adjacent  ASes  in  proportion  to  the  minimum  of  their  degrees  can  ensure  good  Internet  performance. 

8.3  Lessons  for  the  Longer  Term 

Our  measurement  of  the  benefi  ts  of  multihoming  route  control  established  the  effi  cacy  of  the  tech¬ 
nique  in  the  current  Internet.  At  the  same  time,  we  believe  that  certain  key  observations  from  our 
study  of  multihoming  will  continue  to  be  applicable  for  future  Internet  architectures.  For  example, 
this  dissertation  argues  for  richer  fi  rst-hop  connectivity  at  network  end-points.  Irrespective  of  how 
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we  redesign  the  Internet  of  the  future,  we  believe  that  this  basie  prineiple  will  always  offer  several 
advantages  to  endpoints,  sueh  as  the  freedom  to  route  traffi  e  via  the  ISP  of  their  ehoiee  and  the  abil¬ 
ity  to  optimize  for  speeifi  e  performanee  metries  without  relying  on  ISPs  to  provide  the  neeessary 
knobs.  Our  work  also  shows  that  end-to-end  performanee  optimization  does  not  have  to  be  eoupled 
with  optimizing  the  behavior  routing  protoeol  itself.  Therefore,  researeh  on  the  latter  problem  ean 
progress  in  parallel  with  the  development  of  tools  for  endpoint  route  eontrol.  Another  lesson  we 
learn  is  that  while  it  would  be  niee  for  the  Internet’s  routing  protoeol  to  be  performanee  aware,  all 
we  really  need,  at  least  from  the  perspeetive  of  optimizing  end-to-end  performanee,  is  a  protoeol 
that  ean  provide  us  with  a  reasonable  ehoiee  of  routes  to  seleet  from.  The  eurrent  BGP  protoeol 
is  quite  effeetive  at  supporting  sueh  an  interfaee  for  ehoosing  routes.  Of  eourse,  BGP  suffers  from 
other  pathologies,  like  laek  of  seeurity,  unpredietability  and  slow  reeonvergenee,  whieh  need  to  be 
addressed  separately  in  our  quest  for  a  better  wide-area  routing  protoeol  for  the  future. 

8.4  Future  Work 

In  what  follows,  we  diseuss  key  issues  left  open  by  this  dissertation.  Where  applieable,  we  outline 
possible  approaehes  to  address  them. 

8.4.1  Longer-term  Measurement  Analyses 

In  our  study  of  wide-area  Internet  bottleneeks,  we  relied  on  measurements  eolleeted  over  short  time- 
seales  to  derive  the  likelihood  of  links  of  various  types  appearing  as  bottleneeks.  The  most  natural 
extension  is  to  investigate  the  “persistenee”  of  bottleneeks:  On  what  time-seales  does  eongestion  on 
wide-area  bottleneeks  ehange?  Similarly,  our  observations  regarding  the  benefi  ts  of  route  eontrol 
are  based  on  data  eolleeted  over  a  week-long  period.  It  is  useful  to  eomplement  these  observations 
with  analyses  of  ehanges  in  ISP  performanee  over  longer  time-scales,  e.g.,  6  months  or  1  year.  This 
would  also  help  us  understand  if  the  choice  of  the  best  set  of  ISPs  for  multihoming  is  likely  to  change 
over  long  time  periods,  and  if  so,  what  the  impact  on  the  subscriber  performance  (and  subscription 
cost)  may  be. 

8.4.2  Explaining  Diminishing  Returns 

In  Section  4.3,  we  showed  that  there  is  little  benefi  t  from  multihoming  to  more  than  three  ISPs. 
Informally,  this  can  be  explained  by  the  fact  that  a  fourth  ISP  can  add  only  a  limited  amount  of 
diversity  to  that  already  provided  by  three  well-chosen  ISPs.  We  believe  that  the  diminishing  re¬ 
turns  can  be  explained  formally  by  analyzing  the  hierarchical  nature  of  ISP  interconnections.  As 
explained  in  Section  3.1.3,  the  ISP  hierarchy  is  composed  of  5  tiers.  Further,  the  highest  tier  consists 
of  under  20  ISPs,  with  vast  global  reach.  This  implies  that  typical  end-to-end  paths  traverse  one  or 
more  tier-1  ISPs  with  a  high  likelihood.  Another  implication  is  that  paths  via  distinct  ISPs  (say  of 
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tiers  2  or  3)  to  random  Internet  destinations  are  very  likely  to  merge  in  a  tier-1  network.  The  greater 
the  degree  of  overlap  (in  terms  of  the  number  of  routers  or  ISPs),  the  lesser  the  benefi  t  of  employing 
multiple  ISPs.  We  believe  that  it  is  possible  to  model  the  likelihood  for  end-to-end  paths  to  overlap 
with  eaeh  other,  and  then,  to  extend  this  model  to  explain  diminishing  returns  from  multihoming. 

8.4.3  Global  Effects  of  Multihoming  Route  Control 

We  mentioned  in  Seetion  6.4  that  Qiu  et  al.  [45]  study  the  interaetions  between  multiple  route  eontrol 
agents  at  equilibrium.  The  authors  eonelude  that  the  impaet  of  individual  route  eontrol  aetions  on 
global  RTT  performanee  is  minimal,  assuming  an  equilibrium  exists.  This  work  leaves  several 
issues  open  for  eonsideration.  For  example,  ean  route  eontrolled  traffi  e  attain  an  equilibrium  at  all? 
What  are  the  dynamies  of  interaetions  between  route  eontrol  agents  at,  or  when  eonverging  to,  an 
equilibrium?  Does  widespread  deployment  of  route  eontrol  result  in  adverse  interactions  with  ISP 
traffi  c  engineering  by  making  traffi  c  load  due  fo  any  single  end-nefwork  vasfly  unpredicfable? 

Furfher,  ISPs  may  alter  their  pricing  structures  in  response  to  route  control,  in  order  to  “smooth 
out”  the  traffi  c  and/or  fo  affract  more  cusfomers.  It  is  unclear  if  this  diminishes  the  benefi  fs  from 
roufe  control  that  we  identifi  ed  in  this  study.  Finally,  as  route  control  is  more  widely  adopted,  it  is 
open  to  debate  if  the  best  strategy  for  a  new  end-network  is  to  stay  connected  to  the  best  single  ISP, 
or  to  itself  adopt  route  control. 

We  would  like  to  mention  here  that  any  potential  negative  effects  of  wide-spread  deployment 
of  route  control,  such  as  traffi  c  fiucfuations,  could  be  mitigafed,  fo  a  cerfain  exfenf,  by  employing 
“fhird-parfy  roufe  confrol  services”,  such  as  Infernap  Inc.  [53].  In  this  model,  a  third-party  providers 
servicing  a  large  metropolitan  area  buys  high-capacity  uplinks  to  several  large  ISPs.  End-customers 
buy  a  connectivity  service  from  the  third-party  provider,  whereby  they  hand  their  traffi  c  over  fo  the 
service  provider  which  in  turn  selects  on  of  its  many  uplinks  to  send  it  over.  In  making  a  choice  of 
which  ISP  to  send  over,  the  service  provider  could  optimize  for  several  goals  such  as  load-balancing 
across  ISP  uplinks  and  minimizing  traffi  c  fiucfuations  on  any  single  ISP  link. 

From  the  perspective  of  ISPs,  this  approach  is  better  than  allowing  individual  route  control  at 
endpoints  since  it  ensures  that  the  multihomed  traffi  c  aggregafe  from  a  mefropolifan  area  behaves 
in  a  more  predictable,  smooth  fashion.  Also,  in  comparison  with  overlay  service  providers,  route 
control  providers  do  not  run  the  risk  of  violation  of  policies.  As  a  result,  it  may  both  be  easier  to 
deploy  such  services  as  well  as  cheaper  to  subscribe  to  them  in  comparison  with  overlays. 

At  the  same  time,  in  comparison  with  endpoint  control,  the  third-party  route  control  approach 
limits  the  flexibility  of  endpoints.  For  example,  endpoints  can  no  longer  optimize  for  specifi  c  perfor¬ 
mance  goals,  such  as  throughput  as  opposed  to  latency.  Second,  if  endpoints  were  to  coordinate  ISP 
and  route  selection  with  their  remote  destinations  (this  is  common  if  the  two  ends  were  branch  of- 
fi  ces  of  a  single  parent  organization),  then  allowing  a  third  party  service  provider  to  control  routing 
may  offer  inferior  performance  than  allowing  endpoints  full  control. 
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8.4.4  New  Changes  to  BGP 

In  this  dissertations,  we  showed  that  BGP  routes  ean  support  good  performanee  in  the  Internet  today. 
We  also  observed,  in  Seetion  5. 2.1. 2,  that  eooperative  inter-domain  traffie  exehange  between  ISPs 
ean  further  reduee  the  differenees  between  route  eontrol  and  overlay  routing.  There  are  several  ap- 
proaehes  to  eneouraging  eooperation  between  neighboring  ISPs  while  ensuring  that  the  ISPs  do  not 
reveal  anything  about  their  internal  strueture  to  their  neighbors  (see,  for  example,  [68]).  However, 
these  approaehes  work  only  for  pairs  of  neighboring  ISPs.  An  interesting  open  issue  is  to  understand 
whether  these  teehniques  ean  be  extended  to  end-to-end  paths,  and  what  funetional  ehanges  to  BGP 
this  might  require.  It  is  possible  that  sueh  ehanges  to  BGP  may  eompletely  eliminate  the  differenees 
between  overlay  routing  and  BGP-based  path  seleetion. 

8.4.5  Better  Models  for  Congestion  Scaling 

Our  analysis  of  the  sealing  of  eongestion  in  the  Internet  graph  is  limited  to  the  intereonneetions 
between  ISPs.  It  is  unelear  if  the  ISP  intereonneetion  graph  is  likely  to  evolve  so  as  to  maintain  a 
power  law  strueture.  As  large  ISPs  aequire  and  merge  with  other  small  or  large  ISPs,  it  is  likely 
that,  in  future  Internet,  ISPs  will  belong  to  one  of  two  kinds:  large  ISPs  whieh  own  and  operate 
baekbones  (sueh  as  eurrent  tier-1  ISPs),  and  very  small  ISPs  (sueh  as  eurrent  tier-4  ISPs)  eatering  to 
home  or  DSL  eonneetions.  It  would  be  interesting  to  study  the  sealing  properties  of  these  alternate 
intereonneetions. 

Also,  as  we  stated  in  Seetion  7.6,  our  analysis  of  eongestion  sealing  does  not  extend  to  the  In¬ 
ternet’s  router-level  graph.  We  believe  that  it  is  possible  to  develop  simple  approximations  to  the 
router-level  topology,  and  investigate  eongestion  sealing  using  either  analysis  or  simulations.  Fur¬ 
ther,  our  analysis  negleets  the  impaet  of  teehnologies  sueh  as  eontent  distribution  networks  (CDNs). 
By  serving  data  from  loeations  elose  to  end-elients,  CDNs  may  eontribute  to  alleviating  traffi  e  load 
from  key  non-aeeess  Internet  links.  This  effeet  ean  be  modeled  in  our  analysis  by  foreing  path 
lengths  to  be  upper-bound  by  a  small  eonstant. 
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